What is Web Scraping?


  • Humans are good at categorizing information, computers not so much.
  • Often, data on a web site is not properly structured, making its extraction difficult.
  • Web scraping is the process of automating the extraction of data from web sites.

Anatomy of a web page


  • Every website is built on an HTML document that structures its content.
  • An HTML document is composed of elements, usually defined by an opening and a closing
  • Elements can have attributes that define their properties, written as .

Manually scrape data using browser extensions


  • Data that is relatively well structured (in a table) is relatively easily to scrape.
  • Tools may be available on a web page which enable data to be downloaded directly.