Web Scraping
Web Scraping is a technique in which a computer program extracts data from human-readable output coming from websites.
Download full website
wget --mirror --convert-links --adjust-extension --page-requisites --no-parent http://example.org
Website Copier | Download Sites | Website Ripper - Tools Bug
lxml.etree
theXPath - language for XML queries
beautifulsoup
- super short learning curve
- two function api
- parse
- search (find_all)
from bs4 import BeautifulSoup
soup = BeautifulSoup(response.text, 'html.parser')
mydivs = soup.find_all("div", {"class": "stylelistrow"})
print(i, soup.body.div.div)
Selenium (for javascript)
Headless browser
Links
https://www.toptal.com/python/web-scraping-with-python
https://www.freecodecamp.org/news/how-to-scrape-websites-with-python
AI Tools
Scrape and Monitor Data from Any Website with No Code
GitHub - laramies/theHarvester: E-mails, subdomains and names Harvester - OSINT