Skip to main content

Web Scraping

Web Scraping is a technique in which a computer program extracts data from human-readable output coming from websites.

Download full website

wget --mirror --convert-links --adjust-extension --page-requisites --no-parent http://example.org

Website Copier | Download Sites | Website Ripper - Tools Bug

lxml.etree

theXPath - language for XML queries

beautifulsoup

  • super short learning curve
  • two function api
    • parse
    • search (find_all)
from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text, 'html.parser')

mydivs = soup.find_all("div", {"class": "stylelistrow"})
print(i, soup.body.div.div)

Selenium (for javascript)

Headless browser

Proxies

AI Tools