Course 592:
Data Science with Python
(4 days)
Course Description
As we generate more and more data, business needs to make use of it for competitive advantage. What is needed is a consistent, easy to use set of tools, which an analyst can use interactively to extract business value in a timely manner.
Python is quick to learn and supplies tools for manipulating and analyzing data: pandas, numpy, scipy, scikit-learn. This course introduces attendees to these tools and via hands-on exercises based on real-world scenarios, shows how they can be applied to a range of business scenarios.
Learning Objectives
- Learn to apply Python tools and Data Science libraries to provide business value
- Learn basic and advanced NumPy (Numerical Python) features
- Get started with data analysis tools in the pandas library
- Use high-performance tools to load, clean, transform, merge, and reshape data
- Create scatter plots and static or interactive visualizations with matplotlib
- Apply the pandas groupby facility to slice, dice, and summarize datasets
- UseScikit-learn for machine learning
Who Should Attend
Anybody involved in the processing and analysis of data using Python, including business analysts, data engineers, data scientists and software engineers.
Prerequisites
A basic knowledge of Python is required.
Course Outline
Setting Up the Analysis Environment and Toolset Overview
- IPython
- Introduction to:
- NumPy
- SciPy
- Pandas
- Matplotlib
- Scikit-learn
Accessing and Preparing Data
- Loading from CSV Files
- Accessing SQL databases
- Cleansing Data with Python
- Stripping Out Extraneous Information
- Normalizing Data
- Formatting Data
NumPy Essentials: Arrays and Vectorized Computation
- Universal Functions: Fast Element-Wise Array Functions
- Data Processing Using Arrays
- File Input and Output with Arrays
- Linear Algebra
- Random Number Generation
- Example: Random Walks
Getting Started with pandas
- Introduction to pandas Data Structures
- Essential Functionality
- Summarizing and Computing Descriptive Statistics
- Handling Missing Data
- Hierarchical Indexing
Data Wrangling: Clean, Transform, Merge
- Combining and Merging Data Sets
- Reshaping and Pivoting
- Data Transformation
- String Manipulation
Plotting and Visualization
- Introducing matplotlib
- Plotting Functions in pandas
- Plotting Maps
- Python Visualization Tool Ecosystem
Data Aggregation, Group Operations and Time Series
- Python Visualization Tool Ecosystem
- GroupBy Mechanics
- Data Aggregation
- Group-wise Operations and Transformations
- Pivot Tables and Cross-Tabulation
- Time Series Basics
- Date Ranges, Frequencies, and Shifting
- Time Zone Handling
- Periods and Period Arithmetic
- Resampling and Frequency Conversion
- Time Series Plotting
- Moving Window Functions
Data Analysis with SciPy
- Optimization
- Interpolation
- Integration
- Statistics
- Spatial and Clustering Analysis
- Signal and Image Processing
- Sparse Matrices
Machine Learning with Scikit-learn
- Introduction to Machine Learning
- Supervised Learning
- Support Vector Machines
- Naïve Bayes Classifiers
- Unsupervised learning
- Principal Component Analysis
- Clustering Algorithms
Please Contact Your ROI Representative to Discuss Course Tailoring! |