Course 649:
Web Scraping with Python

(2 days)

 

Course Description

The Internet is awash with data. Sometimes it is easy to get, sometimes it looks to be impossible. It doesn’t have to be difficult. This course has students work though the different procedures that allow pulling data from almost any website and making the data ready for analysis. The procedures are in Python for ease of extension.

Note: All Python needed is taught in the course and is very minimal. It is helpful to have a minimum knowledge of HTML.   

Learning Objectives

  • Quick Python 101 to create base for learning Python
  • Data scraping with BeautifulSoup
  • Python lists for data display
  • Data sources discussion
  • Python Pandas DataFrame for graphics and data exploration

Prerequisites

Knowledge of simple Python is helpful but not necessary as all Python necessary is taught as part of the procedures described.

Who Should Attend

This course is aimed at those who need to access data that is available only from a web page.


Course Outline

Hands-On Introduction to Python

  • Objects and Variables
  • Lists
  • String
  • Basic Printing
  • for Statement
  • Reading from a File

Hands-On: Displaying a Web Table

  • HTML Basics
  • How to Display an HTML Page
  • Packages and Modules
  • Help and Documentation on Packages and Modules

Extracting Data

  • if Statement
  • Using Module Requests
  • Using Module BeautifulSoup

Useful Python Format: List

  • Working with the List Data Structure
  • Creating Lists from BeautifulSoup
  • Creating Files for Import into PowerPoint

Useful Python Format: Pandas

  • Working with Pandas DataFrames
  • Creating DataFrame from BeautifulSoup
  • Creating .csv Files from DataFrames

Hands-On: Extracting Tables from Websites

  • Complete Project From
    • Looking at Website
    • Extracting Data
    • Creating .csv Files
    • Creating Various Plots
  • Additional Techniques to Extract Data

Hands-On: Plotting with Pandas

  • Basic Plotting with Pandas

Data Sources

  • Discussion of Annotated List of Data Sources

Hands-On: Complete Project Extracting Tables

  • Complete Project From
    • Looking at Website
    • Extracting Data
    • Creating .csv Files
    • Creating Various Plots

Making a Standalone Program

  • Understanding Python Scripting
  • Converting Jupyter Notebook to Python Script

Other Methods of Extracting Data

  • Chrome Extension for Data Extraction
  • Python Modules for Direct Access of Site Web Pages

Please Contact Your ROI Representative to Discuss Course Tailoring!