What is Web Scraping?


  • Humans are good at categorizing information, computers not so much.
  • Often, data on a web site is not properly structured, making its extraction difficult.
  • Web scraping is the process of automating the extraction of data from web sites.
  • Tools may be available on a web page which enable data to be downloaded directly.

Anatomy of a web page


  • Every website is built on an HTML document that structures its content.
  • An HTML document is composed of elements, usually defined by an opening and a closing .
  • Elements can have attributes that define their properties, written as .
  • CSS may be used to control the appearance of the rendered webpage.
  • Dynamic webpages may have content which isn’t loaded until the item is selected.

Manually scrape data using browser extensionsUsing the Web Scraper Chrome extension


  • Data that is relatively well structured (in a table) is relatively easily to scrape.
  • More often than not, web scraping tools need to be told what to scrape.
  • JQuery can be used to define more precisely what information is to be scraped.