What is Web Scraping?


Figure 1

Screenshot of the Parliament of Canada website (Top)Screenshot of the Parliament of Canada websitei (Bottom)


Figure 2

Screenshot of UK MP list webpage
Screenshot of the UK House of Commons website

Anatomy of a web page


Figure 1

Screenshot of a simple website with the previews HTML

Figure 2

Screenshot of a simple website with the previews HTML
The Document Object Model (DOM) that represents an HTML document with a tree structure. Source: Wikipedia. Author: Birger Eriksson

Figure 3

Screenshot of Chrome developer console
Developer console in Chrome

Figure 4

Dialog with Inspect option
Dialog to select element inspection in Chrome

Figure 5

Code shown in the developer console for a selected element
Code for selected element, displayed in the developer console

Figure 6

Screenshot of element highlighted on web page
Element hightlighted by hovering over code

Manually scrape data using browser extensionsUsing the Web Scraper Chrome extension


Figure 1

Screenshot of UK MP list webpage
Screenshot of the UK MP list website

Figure 2

Screenshot of Web Scraper wizard dialog
Web Scraper Wizard

Figure 3

Screenshot of automatically scraped MP data
Automatically scraped MP data

Figure 4

Screenshot of pagination selection
Pagination & Scroll selection

Figure 5

Screenshot showing data from multiple pages
Data scraped from multiple pages

Figure 6

Dialog showing how to select developer tools docking position
Dialog to select Developer Tools docking position

Figure 7

Screenshot of creating pagination selector
Creating a Pagination selector

Figure 8

Screensho of creating link selector
Creating a Link Selector

Figure 9

Screenshot showing link breadcrumbs
Breadcrumbs for scraped pages

Figure 10

Screenshot of text selector creation
Creating a Text selector to find the MP name

Figure 11

Name and email selectors
Name and email selectors