Managing a Data Mesh with Dataplex

(2 days)

 

Dataplex is an intelligent data fabric that enables organizations to centrally discover, manage, monitor, and govern their data across data lakes, data warehouses, and data marts to power analytics at scale. Specifically, you can use Dataplex to build a data mesh architecture, which is an organizational and technical approach that decentralizes data ownership among domain data owners.

In this course, you will learn how to discover, manage, monitor, and govern your data across data lakes, data warehouses, and data marts through guided lectures and independent exercises using sample data.

Course Objectives

  • Identify the importance of a modern data platform
  • Configure and set up Dataplex
  • Secure data lakes, zones, and assets
  • Implement tagging for resources and use tags to search for assets
  • Process data using Dataplex tasks
  • Design, execute, and report on data quality processes

Audience

Intermediate-level training for those looking to understand how to manage data curation and governance using Dataplex.

Prerequisites

Completion of the Modernizing Data Lakes and Data Warehouses with Google Cloud and Building Batch Data Pipelines on Google Cloud courses in the Data Engineering on Google Cloud specialization or equivalent experience using Google Cloud.


Course Outline

 

Module 1: Introduction to Dataplex

  • Modern data platforms and data-oriented design
  • Pillars of data governance
  • What is Dataplex?
  • Dataplex capabilities
  • Dataplex compared with other products on Google Cloud

Module 2: Creating a Data Mesh on Dataplex

  • What is a data mesh?
  • Dataplex concepts
  • Creating data lakes and zones
  • Assets in Dataplex
  • Lab 1: Provision a Data Mesh Using Dataplex

Module 3: Processing Data on Dataplex

  • Processing data on Dataplex
  • Data preparation tasks
  • Ingestion jobs
  • Dataflow and Spark tasks
  • Lab 2: Standardize Data Using Dataplex Tasks

Module 4: Managing Data Security Through Dataplex

  • IAM permissions and roles
  • Securing your data lake
  • Policy management
  • Metadata security
  • Lab 3: Manage Data Security Using Dataplex

Module 5: Data Tagging and Data Catalog

  • Introduction to Data Catalog
  • Technical metadata vs. business metadata
  • Tags and tag templates
  • Entries and entry groups
  • Data lineage
  • Lab 4: Data Catalog and Data Lineage

Module 6: Data Quality and Profiling

  • Data quality tasks and AutoDQ
  • Reporting on data quality
  • Data profiling
  • Lab 5: Data Quality and Profiling Your Data in BigQuery

Module 7: Dataplex Best Practices

  • Best practices
  • End-to-end demo
  • Lab 6: Challenge Lab