Analyzing and Visualizing Data in Looker
Contact us to book this course
Learning Track
Looker and Looker Studio
Delivery methods
On-Site, Virtual
Duration
1 day
In this course, you learn how to do the kind of data exploration and analysis in Looker that would formerly be done only by SQL-savvy developers or analysts. Upon completion of this course, you will be able to leverage Looker’s modern analytics platform to find and explore relevant content in your organization’s Looker instance, ask questions of your data, create new metrics as needed, and build and share visualizations and dashboards to facilitate data-driven decision making.
Course objectives
- Define Looker and the capabilities it provides for working with data
- Explain the four core analytical concepts in Looker (dimensions, measures, filters, pivots)
- Use Dimensions, Measures, Filters, and Pivots to analyze and visualize data
- Create advanced metrics instantaneously with table calculations
- Create dashboards to combine and share visualizations
- Utilize folders and boards in Looker to organize content for navigability and discoverability.
Audience
- Business users who need to draw insights from data.
- Data analysts who are responsible for data analysis and visualization within their organizations.
Course outline
- What Is Big Data?
- Big Data Use Cases
- The Rise of the Data Center and Cloud Computing
- MapReduce and Batch Data Processing
- MapReduce and Near Real-Time (Stream) Processing
- NoSQL Solutions for Persisting Big Data
- The Big Data Ecosystem
- Overview of HDFS
- Launching HDFS in Pseudo-Distributed Mode
- Core HDFS Services
- Installing and Configuring HDFS
- HDFS Commands
- HDFS Safe Mode
- Check Pointing HDFS
- Federated and High Availability HDFS
- Running a Fully-Distributed HDFS Cluster with Docker
- MapReduce from the Linux Command Line
- Scaling MapReduce on a Cluster
- Introducing Apache Hadoop
- Overview of YARN
- Launching YARN in Pseudo-Distributed Mode
- Demonstration of the Hadoop Streaming API
- Demonstration of MapReduce with Java
- Why Spark?
- Spark Architecture
- Spark Drivers and Executors
- Spark on YARN
- Spark and the Hive Metastore
- Structured APIs, DataFrames, and Datasets
- The Core API and Resilient Distributed Datasets (RDDs)
- Overview of Functional Programming
- MapReduce with Python
- Hive as a Data Warehouse
- Hive Architecture
- Understanding the Hive Metastore and HCatalog
- Interacting with Hive using the Beeline Interface
- Creating Hive Tables
- Loading Text Data Files into Hive
- Exploring the Hive Query Language
- Partitions and Buckets
- Built-in and Aggregation Functions
- Invoking MapReduce Scripts from Hive
- Common File Formats for Big Data Processing
- Creating Avro and Parquet Files with Hive
- Creating Hive Tables from Pig
- Accessing Hive Tables with the Spark SQL Shell
- Features and Use Cases
- HBase Architecture
- The Data Model
- Command Line Shell
- Schema Creation
- Considerations for Row Key Design
- Processing Real-Time Streaming Data
- Storm Architecture: Nimbus, Supervisors, and ZooKeeper
- Application Design: Topologies, Spouts, and Bolts
- Declarative vs. Procedural
- Role of Pig
- Setting Up Pig
- Loading and Working with Data
- Writing a Pig Script
- Executing Pig in Local and Hadoop Mode
- Filtering Results
- Storing, Loading, Dumping
- Relations, Tuples, Fields
- Pig Data Types
- Tuples, Bags, and Maps
- Flatten on Bags and Tuples
- Join and Union
- Regular Expressions