Course 592:
Data Science with Python

(4 days)

 

Course Description

As we generate more and more data, business needs to make use of it for competitive advantage. What is needed is a consistent, easy to use set of tools, which an analyst can use interactively to extract business value in a timely manner.

Python is quick to learn and supplies tools for manipulating and analyzing data: pandas, numpy, scipy, scikit-learn. This course introduces attendees to these tools and via hands-on exercises based on real-world scenarios, shows how they can be applied to a range of business scenarios.

Learning Objectives

  • Learn to apply Python tools and Data Science libraries to provide business value
  • Learn basic and advanced NumPy (Numerical Python) features
  • Get started with data analysis tools in the pandas library
  • Use high-performance tools to load, clean, transform, merge, and reshape data
  • Create scatter plots and static or interactive visualizations with matplotlib
  • Apply the pandas groupby facility to slice, dice, and summarize datasets
  • UseScikit-learn for machine learning

Who Should Attend

Anybody involved in the processing and analysis of data using Python, including business analysts, data engineers, data scientists and software engineers.

Prerequisites

A basic knowledge of Python is required.


Course Outline

Setting Up the Analysis Environment and Toolset Overview

  • IPython
  • Introduction to:
    • NumPy
    • SciPy
    • Pandas
    • Matplotlib
    • Scikit-learn

Accessing and Preparing Data

  • Loading from CSV Files
  • Accessing SQL databases
  • Cleansing Data with Python
  • Stripping Out Extraneous Information
  • Normalizing Data
  • Formatting Data

NumPy Essentials: Arrays and Vectorized Computation

  • Universal Functions: Fast Element-Wise Array Functions
  • Data Processing Using Arrays
  • File Input and Output with Arrays
  • Linear Algebra
  • Random Number Generation
  • Example: Random Walks

Getting Started with pandas

  • Introduction to pandas Data Structures
  • Essential Functionality
  • Summarizing and Computing Descriptive Statistics
  • Handling Missing Data
  • Hierarchical Indexing

Data Wrangling: Clean, Transform, Merge

  • Combining and Merging Data Sets
  • Reshaping and Pivoting
  • Data Transformation
  • String Manipulation

Plotting and Visualization

  • Introducing matplotlib
  • Plotting Functions in pandas
  • Plotting Maps
  • Python Visualization Tool Ecosystem

Data Aggregation, Group Operations and Time Series

  • Python Visualization Tool Ecosystem
  • GroupBy Mechanics
  • Data Aggregation
  • Group-wise Operations and Transformations
  • Pivot Tables and Cross-Tabulation
  • Time Series Basics
  • Date Ranges, Frequencies, and Shifting
  • Time Zone Handling
  • Periods and Period Arithmetic
  • Resampling and Frequency Conversion
  • Time Series Plotting
  • Moving Window Functions

Data Analysis with SciPy

  • Optimization
  • Interpolation
  • Integration
  • Statistics
  • Spatial and Clustering Analysis
  • Signal and Image Processing
  • Sparse Matrices

Machine Learning with Scikit-learn

  • Introduction to Machine Learning
  • Supervised Learning
  • Support Vector Machines
  • Naïve Bayes Classifiers
  • Unsupervised learning
  • Principal Component Analysis
  • Clustering Algorithms

Please Contact Your ROI Representative to Discuss Course Tailoring!