플그래밍/파이써언

[파이썬] 004. CNN 최신 뉴스 긁어오기

훗티v 2020. 8. 21. 13:26

 

CNN 최신 뉴스 페이지에 들어가보면...

 

 

이런 페이지가 뜬다~

 

긁어보자~

 

 

복붙복붙~

 

from collections import Counter
from datetime import datetime
from bs4 import BeautifulSoup
import requests
import shutil
import time
import os

# Looper

z = 1
while z <= 1:

    # HTML Header Section (including CSS)

    with open('cnn_latest.txt', 'a', encoding='utf8') as file:
        file.seek(0, 0)
        file.write("""<!DOCTYPE html>
        <html lang="en">

        <head>
            <meta charset="UTF-8">
            <meta http-equiv="refresh" content="60">
            <meta name="viewport" content="width=device-width, initial-scale=1.0">
            <title>Document</title>

            <style>
                @import url('https://fonts.googleapis.com/css2?family=Montserrat&display=swap');

                .font_class {
                    font-family: 'Montserrat', sans-serif;
                    font-size: 12px;
                    color: #000000;
                    line-height: 25px;
                }
                body {
                    background-color: #FFFFFF;
                    color: #000000;

                }

                a:link, a:visited {
                font-family: 'Montserrat', sans-serif;
                font-size: 12px;
                background-color:;
                color: #000000;
                padding: 1px 1px;
                text-align: ;
                text-decoration: none;
                display: ;
                }

                b {
                background-color: ;
                color: #000000;
                border-left: 5px solid red;
                padding-left: 2px;
                }

                hr {
                border: 1px solid #EAECEE;
                }

                a:hover {
                background-color: #7FFFD4;
                }
            </style>
        </head>

        <body class='font_class'>
        <table>
        """ + "\n")

        file.close()

    # Duplicate File Check

    if os.path.exists("cnn_latest.html"):
        os.remove("cnn_latest.html")
    else:
        print("The file does not exist")

    # Writing HTML File with webscraped variables

    with open('cnn_latest.txt', 'a', encoding='utf8') as file:

        URL = 'https://edition.cnn.com/specials/last-50-stories'

        headers = {
            "User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36 OPR/67.0.3575.115'}

        page = requests.get(URL, headers=headers)
        soup = BeautifulSoup(page.content, 'html.parser')
        images = soup.find_all("span", class_="cd__headline-text")
        links = soup.find_all("h3", class_="cd__headline")
        src = "https://edition.cnn.com"

        i = 1
        f = 50
        s = 0

        file.write("<b>CNN Latest News</b>" +
                   "</br><hr>" + "*"*s + "")
        for image in images:
            if i <= f:
                title = image.text.strip()
                linkable = links[i-1].find("a")
                i += 1
            else:
                file.write("</br>")
                i = 1
                break

            title_listed = title.split()

            file.write("\n" + "<a href='" + src + linkable.get('href') +
                       "'>" + title + "</a>" + "</br><hr>" + "\n")

    file.close()

# Html Footer Section

    with open('cnn_latest.txt', 'a', encoding='utf8') as file:
        now = datetime.now()
        current_time = now.strftime("%H:%M:%S")
        file.write("<br>" + current_time)
        file.write("""</table>

    </body>
    </html>
    """)

    file.close()

# Copying a backup

    os.rename('cnn_latest.txt', 'cnn_latest.html')

    path = os.path.dirname(os.path.realpath(__file__))
    os.chdir(path)

    print(path)

    shutil.copyfile(r'cnn_latest.html', r'upload\cnn_latest.html')

    print("Number " + str(z) + " update.")

# Timer

    time.sleep(60)
    z += 1

 

긁어온 값을 활용해서 HTML 파일을 만들어보자~

 

 

이오케~ 이오케~ 복붙복붙

 

보기 좋다~

 

 

링크도 걸었다~

 

후후후 :)

 

 

 

 

 

 

728x90