Python如何利用BeautifulSoup剔除不想要的标签

2021-09-10 日常小节 108 字

python BeautifulSoup

在爬虫过程中遇到页面中有部分标签不是想要的，但是又无法取下一层标签进行精确定位时，可以用BeautifulSoup中的一下方法进行剔除标签，从而达到目的

from bs4 import BeautifulSoup
html = '
<h3>
<small>Sep 09, 2021, 08:00 ET</small>
Kawaii Islands raises $2.4M in private token sale for its upcoming anime metaverse
</h3>'
page_html = BeautifulSoup(html, 'lxml')
[s.extract() for s in page_html('small')]

print(page_html.text)

添加微信

作者： Init

文章链接： https://www.init888.cn/python/python_BeautifulSoup.html

Python中使用正则去除html标签

Python爬虫遇到中文乱码怎么办