CNN 최신 뉴스 페이지에 들어가보면...
이런 페이지가 뜬다~
긁어보자~
복붙복붙~
from collections import Counter
from datetime import datetime
from bs4 import BeautifulSoup
import requests
import shutil
import time
import os
# Looper
z = 1
while z <= 1:
# HTML Header Section (including CSS)
with open('cnn_latest.txt', 'a', encoding='utf8') as file:
file.seek(0, 0)
file.write("""<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta http-equiv="refresh" content="60">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Document</title>
<style>
@import url('https://fonts.googleapis.com/css2?family=Montserrat&display=swap');
.font_class {
font-family: 'Montserrat', sans-serif;
font-size: 12px;
color: #000000;
line-height: 25px;
}
body {
background-color: #FFFFFF;
color: #000000;
}
a:link, a:visited {
font-family: 'Montserrat', sans-serif;
font-size: 12px;
background-color:;
color: #000000;
padding: 1px 1px;
text-align: ;
text-decoration: none;
display: ;
}
b {
background-color: ;
color: #000000;
border-left: 5px solid red;
padding-left: 2px;
}
hr {
border: 1px solid #EAECEE;
}
a:hover {
background-color: #7FFFD4;
}
</style>
</head>
<body class='font_class'>
<table>
""" + "\n")
file.close()
# Duplicate File Check
if os.path.exists("cnn_latest.html"):
os.remove("cnn_latest.html")
else:
print("The file does not exist")
# Writing HTML File with webscraped variables
with open('cnn_latest.txt', 'a', encoding='utf8') as file:
URL = 'https://edition.cnn.com/specials/last-50-stories'
headers = {
"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36 OPR/67.0.3575.115'}
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
images = soup.find_all("span", class_="cd__headline-text")
links = soup.find_all("h3", class_="cd__headline")
src = "https://edition.cnn.com"
i = 1
f = 50
s = 0
file.write("<b>CNN Latest News</b>" +
"</br><hr>" + "*"*s + "")
for image in images:
if i <= f:
title = image.text.strip()
linkable = links[i-1].find("a")
i += 1
else:
file.write("</br>")
i = 1
break
title_listed = title.split()
file.write("\n" + "<a href='" + src + linkable.get('href') +
"'>" + title + "</a>" + "</br><hr>" + "\n")
file.close()
# Html Footer Section
with open('cnn_latest.txt', 'a', encoding='utf8') as file:
now = datetime.now()
current_time = now.strftime("%H:%M:%S")
file.write("<br>" + current_time)
file.write("""</table>
</body>
</html>
""")
file.close()
# Copying a backup
os.rename('cnn_latest.txt', 'cnn_latest.html')
path = os.path.dirname(os.path.realpath(__file__))
os.chdir(path)
print(path)
shutil.copyfile(r'cnn_latest.html', r'upload\cnn_latest.html')
print("Number " + str(z) + " update.")
# Timer
time.sleep(60)
z += 1
긁어온 값을 활용해서 HTML 파일을 만들어보자~
이오케~ 이오케~ 복붙복붙
보기 좋다~
링크도 걸었다~
후후후 :)
728x90
'플그래밍 > 파이써언' 카테고리의 다른 글
[파이썬] 006. CSV 파일 열기, 읽기 (0) | 2020.09.09 |
---|---|
[파이썬] 005. 네이버 금융 검색상위 종목 긁어오기 (0) | 2020.08.21 |
[파이썬] 003. 복권 당첨번호 긁어오기 (동행복권) (0) | 2020.08.21 |
[파이썬] 002. 실시간 편성표 긁어오기 (네이버) (0) | 2020.08.21 |
[파이썬] 001. 텍스트 파일 합치기 (1) | 2020.08.09 |