Analyzing and Visualizing Data in Looker

Learning Track

Looker and Looker Studio

Delivery methods

On-Site, Virtual

Duration

1 day

In this course, you learn how to do the kind of data exploration and analysis in Looker that would formerly be done only by SQL-savvy developers or analysts. Upon completion of this course, you will be able to leverage Looker’s modern analytics platform to find and explore relevant content in your organization’s Looker instance, ask questions of your data, create new metrics as needed, and build and share visualizations and dashboards to facilitate data-driven decision making.

Course objectives

Define Looker and the capabilities it provides for working with data
Explain the four core analytical concepts in Looker (dimensions, measures, filters, pivots)
Use Dimensions, Measures, Filters, and Pivots to analyze and visualize data
Create advanced metrics instantaneously with table calculations
Create dashboards to combine and share visualizations
Utilize folders and boards in Looker to organize content for navigability and discoverability.

Audience

Business users who need to draw insights from data.
Data analysts who are responsible for data analysis and visualization within their organizations.

Course outline

1Overview of Big Data

What Is Big Data?
Big Data Use Cases
The Rise of the Data Center and Cloud Computing
MapReduce and Batch Data Processing
MapReduce and Near Real-Time (Stream) Processing
NoSQL Solutions for Persisting Big Data
The Big Data Ecosystem

2The Hadoop Distributed File System (HDFS)

Overview of HDFS
Launching HDFS in Pseudo-Distributed Mode
Core HDFS Services
Installing and Configuring HDFS
HDFS Commands
HDFS Safe Mode
Check Pointing HDFS
Federated and High Availability HDFS
Running a Fully-Distributed HDFS Cluster with Docker

3MapReduce with Hadoop

MapReduce from the Linux Command Line
Scaling MapReduce on a Cluster
Introducing Apache Hadoop
Overview of YARN
Launching YARN in Pseudo-Distributed Mode
Demonstration of the Hadoop Streaming API
Demonstration of MapReduce with Java

4Introduction to Apache Spark

Why Spark?
Spark Architecture
Spark Drivers and Executors
Spark on YARN
Spark and the Hive Metastore
Structured APIs, DataFrames, and Datasets
The Core API and Resilient Distributed Datasets (RDDs)
Overview of Functional Programming
MapReduce with Python

5Apache Hive

Hive as a Data Warehouse
Hive Architecture
Understanding the Hive Metastore and HCatalog
Interacting with Hive using the Beeline Interface
Creating Hive Tables
Loading Text Data Files into Hive
Exploring the Hive Query Language
Partitions and Buckets
Built-in and Aggregation Functions
Invoking MapReduce Scripts from Hive
Common File Formats for Big Data Processing
Creating Avro and Parquet Files with Hive
Creating Hive Tables from Pig
Accessing Hive Tables with the Spark SQL Shell

6Persisting Data with Apache HBase

Features and Use Cases
HBase Architecture
The Data Model
Command Line Shell
Schema Creation
Considerations for Row Key Design

7Apache Storm

Processing Real-Time Streaming Data
Storm Architecture: Nimbus, Supervisors, and ZooKeeper
Application Design: Topologies, Spouts, and Bolts

8Apache Pig

Declarative vs. Procedural
Role of Pig
Setting Up Pig
Loading and Working with Data
Writing a Pig Script
Executing Pig in Local and Hadoop Mode
Filtering Results
Storing, Loading, Dumping

9Getting the Most Out of Pig

Relations, Tuples, Fields
Pig Data Types
Tuples, Bags, and Maps
Flatten on Bags and Tuples
Join and Union
Regular Expressions

Ready to accelerate your team's innovation?

Schedule a meeting

Unlock your team’s potential and get the most from your tech stack

Schedule a meeting