Course 662:
Data Science Using R

(4 days)

 

Course Description

This course is designed for those who have a background in programming. The programming language R is taught as a by-product of learning the basics of data science. Part I of the course focuses on the parts of R that allow you to explore the data. Part II focuses on cleaning, transforming, and filling in the data, and data wrangling. Part III focuses on R data structures and iteration techniques necessary for building models.  Part IV introduces data science models and model builds.

Learning Objectives

After successfully completing this course, students will know how to:

  • Load, transform, and process data
  • Visualize numeric and categorical data
  • Perform dimensionality reduction
  • Build regression models
  • Perform clustering and classification on structured data
  • Analyze unstructured data
  • Build association rules for transactional data
  • Write custom functions in R
  • Build a custom R Program

Who Should Attend

Anyone needing to understand how to use R in data science.

Prerequisites

To be successful in this course, you need to know basic programming skills. Basic knowledge of Linux command line is also useful but not necessary.


Course Outline

Unit 1: Introduction to R and Data Loading

  • Quick Installation of R and R-Studio
  • R Packages and Functions
  • Configuring R Environment in R-Studio
  • Help and Examples in R
  • Hands-On Exercise 1.1
  • Loading CSV Data in R
  • Loading Various Data Formats in R
  • Loading Relational Database Data in R
  • Writing Data from R
  • Hands-On Exercise 1.2

Unit 2: R-Data Types

  • Simple Data Types
  • Basic Operations on Simple Data Types
  • Hands-On Exercise 2.1
  • Vectors
  • Lists
  • Data Frames
  • Matrices and Arrays
  • Operations on Vector, Lists, Matrices, and Arrays
  • Hands-On Exercise 2.2

Unit 3: Data Visualization

  • Numeric Data Plots
    • Histogram, Pie Chart
    • Box Plot, Cumulative
    • Line, Bar
  • Categorical Data Plots
    • Bar, Mosaic
  • Correlation
    • Pair Graph
    • Correlation Graph
  • Hands-On Exercise 3.1
  • Stack Plots
  • Complex Graph Plots
  • Hands-On Exercise 3.2

Unit 4: Data Transformation

  • Data Rescaling
  • Data Imputation
  • Data Decoding
  • Missing Data Handling
  • Outlier Handling
  • Principle Component Analysis
  • Hands-On Exercise 4.1

Unit 5: Data Processing in R

  • Creating Data Subset Using Different Strategies
  • Simple Data Analysis with Core Functions
  • Data Analysis with Aggregate Function
  • Hands-On Exercise 5.1
  • Comprehensive Data Analysis with dplyr Package
  • Using SQL Background to Work With
  • Hands-On Exercise 5.2

Unit 6: Simple Linear Regression

  • Basic Linear Regression
  • Building Model in R
  • Testing the Accuracy of the Model
  • Prediction with the Regression Model
  • Hands-On Exercise 6.1

Unit 7: Association Rule Analysis with R

  • Introduction to Association Rules
  • Building Rules in R
  • Determining Quality of Rules
  • Performing Sensitivity Analysis on the Rules
  • Hands-On Exercise 7.1

Unit 8: Clustering and Classification with R Structured Data

  • Introduction to Clustering
  • K-means Clustering
  • Determining Accuracy of Clustering
  • Hands-On Exercise 8.1
  • Introduction to Classification
  • Building Classification Model with R
  • Determining Accuracy of Classification
  • Hands-On Exercise 8.2

 Unit 9: Text Processing in R         

  • Key Steps in Text Processing
  • Simple Processing of Tweets Using R
  • Word Frequency Analysis of Tweets
  • Hands-On Exercise 9.1
  • Clustering of Tweets
  • Hands-On Exercise 9.2

Unit 10: Building Custom Code in R

  • Introduction to Custom Functions in R
  • Functions with return Statement
  • Functions without return Statement
  • Control Statements
  • Hands-On Exercise 10.1

Unit 11: Building Packages in R  

  • Steps for Building Packages
  • A Simple Example
  • Hands-On Exercise 11.1

Please Contact Your ROI Representative to Discuss Course Tailoring!