Course 662:
Data Science Using R

(4 days)


Course Description

This course is designed for those who have a background in programming. The programming language R is taught as a by-product of learning the basics of data science. Part I of the course focuses on the parts of R that allow you to explore the data. Part II focuses on cleaning, transforming, and filling in the data, and data wrangling. Part III focuses on R data structures and iteration techniques necessary for building models.  Part IV introduces data science models and model builds.

Learning Objectives

After successfully completing this course, students will know how to:

  • Load, transform, and process data
  • Visualize numeric and categorical data
  • Perform dimensionality reduction
  • Build regression models
  • Perform clustering and classification on structured data
  • Analyze unstructured data
  • Build association rules for transactional data
  • Write custom functions in R
  • Build a custom R Program

Who Should Attend

Anyone needing to understand how to use R in data science.


To be successful in this course, you need to know basic programming skills. Basic knowledge of Linux command line is also useful but not necessary.

Course Outline

Unit 1: Introduction to R and Data Loading

  • Quick Installation of R and R-Studio
  • R Packages and Functions
  • Configuring R Environment in R-Studio
  • Help and Examples in R
  • Hands-On Exercise 1.1
  • Loading CSV Data in R
  • Loading Various Data Formats in R
  • Loading Relational Database Data in R
  • Writing Data from R
  • Hands-On Exercise 1.2

Unit 2: R-Data Types

  • Simple Data Types
  • Basic Operations on Simple Data Types
  • Hands-On Exercise 2.1
  • Vectors
  • Lists
  • Data Frames
  • Matrices and Arrays
  • Operations on Vector, Lists, Matrices, and Arrays
  • Hands-On Exercise 2.2

Unit 3: Data Visualization

  • Numeric Data Plots
    • Histogram, Pie Chart
    • Box Plot, Cumulative
    • Line, Bar
  • Categorical Data Plots
    • Bar, Mosaic
  • Correlation
    • Pair Graph
    • Correlation Graph
  • Hands-On Exercise 3.1
  • Stack Plots
  • Complex Graph Plots
  • Hands-On Exercise 3.2

Unit 4: Data Transformation

  • Data Rescaling
  • Data Imputation
  • Data Decoding
  • Missing Data Handling
  • Outlier Handling
  • Principle Component Analysis
  • Hands-On Exercise 4.1

Unit 5: Data Processing in R

  • Creating Data Subset Using Different Strategies
  • Simple Data Analysis with Core Functions
  • Data Analysis with Aggregate Function
  • Hands-On Exercise 5.1
  • Comprehensive Data Analysis with dplyr Package
  • Using SQL Background to Work With
  • Hands-On Exercise 5.2

Unit 6: Simple Linear Regression

  • Basic Linear Regression
  • Building Model in R
  • Testing the Accuracy of the Model
  • Prediction with the Regression Model
  • Hands-On Exercise 6.1

Unit 7: Association Rule Analysis with R

  • Introduction to Association Rules
  • Building Rules in R
  • Determining Quality of Rules
  • Performing Sensitivity Analysis on the Rules
  • Hands-On Exercise 7.1

Unit 8: Clustering and Classification with R Structured Data

  • Introduction to Clustering
  • K-means Clustering
  • Determining Accuracy of Clustering
  • Hands-On Exercise 8.1
  • Introduction to Classification
  • Building Classification Model with R
  • Determining Accuracy of Classification
  • Hands-On Exercise 8.2

 Unit 9: Text Processing in R         

  • Key Steps in Text Processing
  • Simple Processing of Tweets Using R
  • Word Frequency Analysis of Tweets
  • Hands-On Exercise 9.1
  • Clustering of Tweets
  • Hands-On Exercise 9.2

Unit 10: Building Custom Code in R

  • Introduction to Custom Functions in R
  • Functions with return Statement
  • Functions without return Statement
  • Control Statements
  • Hands-On Exercise 10.1

Unit 11: Building Packages in R  

  • Steps for Building Packages
  • A Simple Example
  • Hands-On Exercise 11.1

Please Contact Your ROI Representative to Discuss Course Tailoring!