Course 401:Big Data Essentials for Managers & Non-Programmers

Course 401:
Big Data Essentials for Managers & Non-Programmers

(2 days)

Course Description

In recent years, enterprises have seen unprecedented increases in the amount and variety of data being captured as a byproduct of daily operations. Emerging technologies now enable managers to exploit this data, to gain previously unattainable insights into their operations and their customers, and to help identify new opportunities that will propel their enterprises to the next level. This course provides insight into these opportunities and technologies, to help provide focus for big data management and analysis.

The course has hands-on exposure to some of the key tools and includes workshops to investigate how Big Data technology can be applied to different business problems and data types.

Learning Objectives

Understand the definition and significance of big data and big data analytics in the enterprise
Discuss the architecture, operation, and business benefits of a Big Data solution
Examine the potential business opportunities that big data capabilities can uncover
Explore the relationship between cloud computing and big data
Identify how Big Data differs from traditional data solutions
Gain insight into emerging management and analytical tools for big data, and compare them with traditional data manipulation tools

Who Should Attend

Audience includes executives, project managers at all levels of experience, and software practitioners with an interest in understanding concepts and technologies—and related opportunities and constraints—in exploiting big data in the enterprise.

Prerequisites

No specific prerequisites are required. Basic UNIX/Linux command line skills are helpful, but hands-on labs and workshops are fully guided. Programming experience is not required.

Hands-On Exercises and Workshops

Workshop: Finding Big Data Sources, Opportunities, and Challenges in Your Enterprise
Hands-On Exercise: Using a Key-Value Datastore
Hands-On Exercise: Using a Document Datastore
Hands-On Exercise: Examining an Application That Uses a Graph Datastore
Hands-On Exercise: Column-Oriented Data Storage
Hands-On Exercise: Using a Distributed File System
Workshop: Looking at Your Data Differently
Hands-On: Running a MapReduce Job
Hands-On: Extracting Information from Semi-Structured Data
Hands-On: Extracting Di-grams from Text Input
Hands-On: Data Aggregation and Tabulation
Workshop: Processing Data in New Ways
Hands-On: Preparing your Data Set for Analysis
Hands-On: Extracting Sentiment from a Data Set with Pig
Hands-On: Querying Your Data Set for Intensity
Hands-On: Using HiveQL
Hands-On: Performing Ad-hoc Queries against a Data Warehouse with Hive
Workshop: Analyzing/Mining Your Data
Hands-On: Understand and Use a Recommender Engine
Hands-On: Seeing What R Can Do
Hands-On: Use R to Process and Visualize Data
Workshop: Data Analytics for the Enterprise
Hands-On: Examine Data Sources and Identify Useful Data
Workshop: Solving a Problem with Big Data Tools

Course Outline

Chapter 1: Introduction to Big Data

What Is Big Data?
- The Technical Challenges Posed by Big Data
- Common Big Data Use Cases
- Structured, Semi-Structured, and Unstructured Data
- Transforming Data to Information to Business Value
- Big Data Stack
Working with Big Data
- Locating/Extraction
- Processing
- Storage
- Analysis/Interpretation
- Representation
Tools of Big Data
- Distributed File Systems
- No-SQL Storage
Compute Engines
Distributed Infrastructures
Analysis Engines
Graphical Representation
Cloud Computing
- Definition
- Relationship to Big Data
- Characteristics
Business Opportunities in Big Data
- Investment, Costs, and Benefits
- Skill Sets Needed (Analytics, Programming, Systems Management, Presentation)
- Data Characteristics (Sources, What to Keep, How Long)
Use Cases Workshop: Finding Big Data Sources, Opportunities, and Challenges in Your Enterprise

Chapter 2: Storing Data

Traditional Data Storage
- Defined
- Files and Folders
- File Structure Internals (ASCII/Binary)
- Flat Files (fixed width, csv/tab delimited, xml)
- Structured Data Storage
- Row-Oriented Storage
- Relational Databases
- Reaching Limitations
The New Storage – Not Only SQL (NoSQL)
Why We Need NoSQL Datastores?
Key-Value Datastores
Hands-On: Using a Key-Value Datastore
Document Datastores
Hands-On: Using a Document Datastore
Graph Datastores
Hands-On: Examining an Application that Uses a Graph Datastore
Column-Oriented Data Storage
Why Do I Need Column-Oriented Storage?
Capabilities and Limitations
Querying Data from a Column-Oriented Datastore
Partitioning and Sharding
Hands-On: Column-Oriented Data Storage
Distributed File Systems
How Much Data Can Be Stored?
- Traditional File Server
- Enterprise Storage (NAS)
- Cloud Storage
- Distributed File Systems
Scaling Structured Data Storage
Hands-On: Using a Distributed File System
Real-World Examples That Use NoSQL Data Stores and Distributed File Systems
Use Case Workshop: Looking at Your Data Differently

Chapter 3: Compute Engines

Compute Engines
- Common Enterprise Architectures
  - Distributed Systems and Service-Oriented Architectures
  - Client-Server, Middleware, and Clustering
- Scaling Horizontally vs. Scaling Vertically
  - Bigger Single-Systems
  - Parallelization/Grid-Computing
- What Is a Compute Engine?
  - Distributed Processing
  - Dynamic Resource Allocation
  - Building vs. Buying
- MapReduce
  - Patterns
  - Algorithms
  - Use Cases
- Hands-On: Running a MapReduce job
Hadoop Introduction
Introduction to Hadoop and YARN
Hadoop Distributed File System
Processing Data with Hadoop
Mapping and Reducing Data
Partitioning and Mapping
Streaming Data
Common Hadoop Tasks and Tools
- Sequential Processing
- Extracting and Transforming Large Data Sets
- Basic UNIX/Linux Commands (grep, wc, awk, uniq, sort)
- Search, Count, Tabulate
- Hands-On: Extracting Information from Semi-Structured Data
- Di-gram / Word-Pair Extraction
- Hands-On: Extracting Di-grams from Text Input
- Data Aggregation
- Hands-On: Data Aggregation and Tabulation
- Beyond Text: Image/Sound Processing
Common Hadoop Tools
- Sequential Processing
- Leverage Scripting: Perl/Python
- How Programmers (Java) Further Extend Hadoop
Real-World Examples That Use Hadoop and Hadoop Distributed File System
Use Case Workshop: Processing Data in New Ways

Chapter 4: Analyzing Large Data Sets

Process of Analyzing Big Data Sets
- Map-Reduce
- Analysis
- Creating MapReduce Jobs with Higher Level Languages
- Hands-On: Preparing Your Data Set for Analysis
Pig
- What Is Pig and How Does it Relate to Hadoop?
- Simplify the Data Flow
Pre-processing the Data
Using Grunt (the interactive shell for Pig)
Loading and ForEach/Generating
Hands-On: Extracting Sentiment from a Data Set with Pig
Understanding a Pig Script
- Loading
- ForEach / Generate
- Dump and Limit
- Filtering, Grouping, and Ordering
Storing Results
Hands-On: Querying Your Data Set for Intensity
Hive
- What Is Hive and How Does it Relate to Hadoop?
- Creating the Data Warehouse
- Projecting Structure on Stored Data
- Hive Query Language (HiveQL) Basics
- Hands-On: Using HiveQL
- Batch Processing vs. Real-Time Queries
- Performing Ad-hoc Queries
  - Functions
  - Working Around Limitations
- Data Warehouse Concepts
  - Partitioning
  - Sampling / Buckets
- Hands-On: Performing Ad-hoc Queries against a Data Warehouse
Real-World Examples that Use Pig and Hive
Use Case Workshop: Analyzing/Mining Your Data

Chapter 5: Data Analytics and Visualization

Leveraging Big Data within Applications
- Clustering / Grouping Data That Shares Similarities
- Classification / Categorize Data
- Recommender Engines
Mahout
- What Is Mahout?
- Integrating within an Application
- Hands-On: Understand and Use a Recommender Engine
R – Statistical Programming and Visualization Language
- What Is R and How Does it Apply to Big Data?
- Performing Analysis with R
- How Does R and Hadoop Integrate (RHadoop)?
- Visualizing Data with R
- Hands-On: Seeing What R Can Do
- Integrating within an Application
- Hands-On: Use R to Process and Visualize Data
Real-World Examples of Data Analytics
Use Case Workshop: Data Analytics for the Enterprise

Chapter 6: Bringing It All Together

Review of Big Data Layers
Architecting a Big Data Solution (Data Lake)
- Architecture Data Sources
- Transformation and Storage of Data
- Processing and Analysis of Data
- Consumption of Data
Finding Sources of Data
- Traditional Sources
- Document Data and Meta-Data
- Enterprise Systems
- Logs and Events
- Temporal Data
- Hands-On: Examine Data Sources and Identify Useful Data
Understanding and Expanding the Big Data Ecosystem Tools (high-level)
- Batch Processing: Pig, Hive
- Interactive Querying: Impala, Tez, and Hawq
- Stream Processing: Storm, Spark
- Search: Solr, ElasticSearch
- Machine Learning: Mahout, R
- Infrastructure: Oozie, Flume, Sqoop, Zookeeper, Sentry, hCatalog
- Building Your Own Solutions
Commercial Cloud Solutions Overview
- Google Big Data Solutions
- Amazon Big Data Solutions
Final Use Case Workshop: Solving a Problem with Big Data Technologies

Please Contact Your ROI Representative to Discuss Course Tailoring!