Course 591:
Python Tools for Data Analysis and Visualization

(4 days)


Course Description

Through a series of hands-on exercises, students will learn to turn data into actionable information. The world is drowning in data. Each day 2.5 Exabytes of data (250 new Library of Congresses built or 90 years of HD video) is produced. The problem is getting the data into a format which can be used by tools that help in understanding and verifying the data. Python programming is relatively quick to learn and has a great set of tools for importing, transforming, exploring, extracting insights from, making predictions with, and exporting the data. This course introduces the major Python tools used for preparing the data for analysis, the tools available for understanding the data, and using the data for insights and predictions. All class work and exercises are done in Python 3.x.

Learning Objectives

  • Learn how to use Jupyter notebooks
  • Learn how to work with NumPy datatypes
  • Be proficient in pandas Series
  • Be proficient in pandas DataFrames
  • Understand how to use data visualization
  • Know how to import and clean data
  • Introduce statistical tools for working with data sets
  • An introduction to the problems of working with PDF data sources
  • Introduce machine learning tools for working with data sets
  • Work through a complete data analysis to understand how the tools interact with each other

Who Should Attend

Anyone wanting to use Python as part of their data analysis program.


Basic knowledge of Python.

Course Outline

Unit 1: Advanced Python Review

  • A Python Development Environment
  • A Review of Data type
  • The New Class Structure
  • Python Best Practices

Unit 2: Ipython Notebook

  • Functionality Provided – Why Use Them?
  • CRUD for Notebooks
  • Interface and Shortcuts

Unit 3: NumPy

  • Datatypes
  • Universal Functions
  • Indexing
  • Summary Methods
  • Sorting
  • Computations and Broadcasting

Unit 4: SciPy

  • Overview of SciPy
  • Statistical Functions

Unit 5: Panda: Series

  • Pandas Series Structure
  • Series CRUD
  • Indexing and Access Techniques
  • Data Methods

Unit 6: Pandas: DataFrame Basics

  • DataFrame Construction
  • DataFrame Change and Reorganization
  • Indexing and Access Techniques
  • Grouping, Pivoting, and Reshaping
  • DataFrame CRUD

Unit 7: Pandas DataFrame: Data Manipulation

  • Statistics
  • Data Methods
  • Missing Data Tools

Unit 8: Understanding Data Visualization

  • Visualization Is Storytelling
  • Types of Charts
  • Colors Yes and No
  • Common Mistakes
  • Best Practices
  • Reproducibility

Unit 9: Matplotlib for Data Visualization

  • Steps for Creating a Data Visualization
  • Jupyter Notebooks and Matplotlib
  • Matplotlib Styles
  • Small Multiples
  • Panda Series Plotting
  • Panda Dataframe Plotting

Unit 10: Advanced Techniques

  • Seaborn
  • Bokeh

Unit 11: Data Cleaning

  • Importing Data: csv, xml, html, xls
  • Problems of PDF Data Sources
  • Transformations Data
  • Missing Data
  • Time Series Problems
  • Automation of Process

Unit 12: Statistics for Understanding Data

  • Exploratory Data Analysis Tools: PMF, CDF, Correlation, Least Squares
  • A/B Testing
  • Hypothesis Test
  • Statistical Significance, P-Values, and Confidence Intervals
  • Z- and T- Statistics

Unit 13: Approach to Understanding Data

  • Overview of Approach
  • Great Data Sources
  • Class Demonstration on Data Set
  • Team Project: Working a Project

Unit 14: Introduction to Statistical Techniques

  • Regression and Prediction
  • Classification
  • K-Nearest Neighbors
  • Tree Models
  • Clustering

Unit 15: Introduction to Machine Learning

  • Regression and Prediction
  • Classification
  • K-Nearest Neighbors
  • Clustering
  • Neural Networks
  • Deep Learning

Please Contact Your ROI Representative to Discuss Course Tailoring!