文章目录
分享文章
需求: 业务提供一些关键词,让我整理这些关键词生产厂家,也就是通过浏览器找到对应的网站,业务再从网站挖掘客户信息
方法一:使用 Python
爬取时间久了,出现人机验证,因此放弃
from lxml import html
html_content = """
<a jsname="UWckNb" href="https://seitron.com/en/lpg-gas-leak-detector-kit-beagle.html" data-ved="2ahUKEwjdwL_4ipKIAxUmjVYBHbK4HM4QFnoECA4QAQ" ping="/url?sa=t&source=web&rct=j&opi=89978449&url=https://seitron.com/en/lpg-gas-leak-detector-kit-beagle.html&ved=2ahUKEwjdwL_4ipKIAxUmjVYBHbK4HM4QFnoECA4QAQ" target="_blank" rel="noopener"><br><h3 class="LC20lb MBeuO DKV0Md">LPG gas leak detector Kit - Beagle</h3><div class="notranslate HGLrXd NJjxre iUh30 ojE3Fb"><div class="q0vns"><span class="H9lube"><div class="eqA2re NjwKYd Vwoesf" aria-hidden="true"><img class="XNo5Ab" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAABS0lEQVR4Ac3UA2iuYRiH8ffYJ7uTrWNvYWbecrO95WUtu75lL84202zbtnWvu67Fmc+/funDc314X+uyeSelvIcHbChGCWzwxhs8UdY5MyuAA1/CDb2QaxqBC54rQwM4GGGQOxaLJ8rIADfIOVYxhBFsQa7IVxkSwGWGPggOUYNI/MA/JKMdcolRfFBGBHhAcIhifFJXuHy/oRnHEAh2EKGMCLBBMItv6ga3bg/sQCA4RoYyIqAYglJl3WKELEDO0aSMCCjBMVLVHQTYw/EUfNCtjAiwQZCmrHsYYU4oV0YEeEHQpe4xoBBJyoiA1xjBBoLUHR4ciQW8U0YEPIELBIP4qW5xcCAOEKAs9sgBjCc+RywEMwjFBzzHSzzDF2RBkKIsZloA4yeBL8YhWEcD6jCNY8zDX1nM4IDzgz4iElnoPIUcxOCdspixASc00tI90g0CnwAAAABJRU5ErkJggg==" style="height:18px;width:18px" alt="" data-csiid="HCjMZt3CKqaa2roPsvHy8Aw_6" data-atf="1"></div></span><div class="CA5RN"><div><span class="VuuXrf">Seitron</span></div><div class="byrV5b"><cite class="qLRx3b tjvcx GvPZzd dTxz9 cHaqb" role="text">https://seitron.com<span class="ylgVCe ob9lvb" role="text"> › lpg-gas-leak-detector-ki...</span></cite></div></div></div></div><span jscontroller="IX53Tb" jsaction="rcuQ6b:npT2md" style="display:none"></span>
"""
tree = html.fromstring(html_content)
href_value = tree.xpath('//a[@jsname="UWckNb"]/@href')[0]
print(href_value)
方法二:使用 xpath helper
Query
//a[@jsname="UWckNb
结果