Course 804:
Machine Learning with Pig

(2 days)


Course Description

This course is an introduction to how Pig and Pig Latin from the Apache Hadoop family of tools can be used to solve machine learning (ML) problems. The course does a quick survey of ML use cases spotlighting where Apache Pig can be applied. With the use cases understood, the course dives into the details of Apache Hadoop MapReduce, Pig, and Pig Latin to show how these tools can be used to solve ML problems. At each step of the way, there are hand-on exercises to reinforce the concepts explained in the lecture.

Learning Objectives

After successfully completing this course, students will be able to:

  • Describe what machine learning is and what problems it can solve
  • Use Pig to create map reduce applications
  • Solve ML problems using Pig


No prior experience with Apache Hadoop or Apache Pig is required. All programming will be done using the Pig Latin language which is taught in the course. Some concepts will be illustrated with Python and Java.

Who Should Attend

This course is intended for anyone wanting to understand how Apache Pig can be used to solve ML problems. This is a hands-on course. The exercises are intended to give the participants first-hand experience with developing applications that run on Apache Hadoop.

Hands-On Labs

The hands-on labs are a key learning element of this course. Each lab reinforces the material presented in lecture.

Course Outline

1.   Overview of Machine Learning

  • Machine Learning Defined
  • Use Cases for Machine Learning
  • Types of Machine Learning
  • Apache Hadoop MapReduce Programming Tools and Machine Learning

2.   Introduction to Apache Hadoop

  • The Design of HDFS
  • How YARN Works as a Cluster Manager
  • The Map-Reduce Paradigm
  • Using the Streaming API

3.   Working with Apache Pig

  • Why Pig?
  • Role of Pig
  • Setting Up Pig
  • Loading and Working with Data
  • Writing a Simple Pig Script
  • Executing Pig in Local and Cluster Mode

4.   Working with Pig Latin

  • Pig Latin and Map Reduce
  • Relations, Tuples, and Fields
  • Simple Pig Latin Data Types
  • Casting
  • Tuples, Bags, and Maps
  • Projections
  • Group By and Flatten
  • Joins and Unions
  • Filtering and Ordering Results
  • Storing, Loading, Dumping
  • Built-In Functions

Please Contact Your ROI Representative to Discuss Course Tailoring!