Skip to main content

Web Scraping

Web Scraping is a technique in which a computer program extracts data from human-readable output coming from websites.

Download full website

wget --mirror --convert-links --adjust-extension --page-requisites --no-parent http://example.org

Website Copier | Download Sites | Website Ripper - Tools Bug

lxml.etree

theXPath - language for XML queries

beautifulsoup

  • super short learning curve
  • two function api
    • parse
    • search (find_all)
from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text, 'html.parser')

mydivs = soup.find_all("div", {"class": "stylelistrow"})
print(i, soup.body.div.div)

Selenium (for javascript)

Headless browser

https://www.toptal.com/python/web-scraping-with-python

https://www.freecodecamp.org/news/how-to-scrape-websites-with-python

AI Tools

Scrape and Monitor Data from Any Website with No Code

GitHub - laramies/theHarvester: E-mails, subdomains and names Harvester - OSINT