AWS Analytics on Databricks

Delivery methods

On-Site, Virtual

Duration

1 day

This comprehensive, vertical-specific training demonstrates Databricks' strategic value for building modern data platforms on AWS, from batch ETL to real-time ML inference. Participants will learn to implement the Lakehouse architecture using Delta Lake for ACID transactions on S3, eliminating the traditional data warehouse vs. data lake trade-off. The curriculum emphasizes the Medallion Architecture (Bronze, Silver, Gold) for organized data transformation and Unity Catalog for enterprise governance. By day's end, students will be able to justify unified analytics ROI and implement production data pipelines with proper lineage tracking, access control, and cost optimization.

Learning Objectives

Articulate the Databricks on AWS value proposition by comparing the TCO of Lakehouse architecture vs. traditional data warehouse + data lake combinations.
Deploy and configure Databricks workspaces on AWS with proper VPC networking, IAM instance profiles, and PrivateLink connectivity for secure data access.
Master Delta Lake Operations using ACID transactions, time travel, and schema evolution to build reliable data pipelines on Amazon S3.
Optimize Compute Costs by implementing auto-termination policies, spot instance strategies, and right-sizing clusters for different workload types.
Implement MLOps Best Practices using MLflow for experiment tracking, model registry, and production serving endpoints on AWS infrastructure.
Accelerate Data Engineering by leveraging Delta Live Tables for declarative pipeline development with built-in data quality expectations.
Execute Business-Aligned Analytics by completing a vertical-specific lab (Fintech, Healthcare, or Media) to solve industry-specific AWS data challenges.

Who Should Attend

Data Engineers, Analytics Engineers, and Data Architects responsible for building data platforms on AWS. Previous SQL experience would be helpful but is not required. Prior experience with core AWS services (S3, IAM, VPC) is assumed.

Course outline

1Foundational Value: The Lakehouse Paradigm

The "Warehouse vs. Lake" Debate: Why Lakehouse Wins
Understanding DBUs: Databricks Unit Economics on AWS
Workspace Architecture: Control Plane vs. Data Plane on AWS

Demo: Creating a Databricks Workspace with AWS Quick Start

2Data Foundation: Delta Lake on Amazon S3

ACID Transactions: Reliability Without the Data Warehouse Tax
Time Travel: Query Historical Data and Rollback Mistakes
Schema Evolution: Handling Changing Data Without Breaking Pipelines

Lab: Converting Parquet to Delta and Implementing Time Travel Queries

3Modern Architectures: Medallion Pattern and DLT

Bronze Layer: Raw Ingestion with Auto Loader and Kafka Connect
Silver Layer: Cleansing, Deduplication, and Data Quality Rules
Gold Layer: Business Aggregations and Dimensional Modeling

Demo: Building a Complete Medallion Pipeline with Delta Live Tables

4Applied AI: MLflow and Model Serving

Experiment Tracking: Reproducible ML with Automatic Logging
Model Registry: Staging, Production, and Approval Workflows
Feature Store: Centralized Feature Engineering and Serving
Model Serving: Real-Time Inference Endpoints on AWS

Demo: Training and Deploying a Model with MLflow

5Business-Aligned Analytics Hands-On Labs (Vertical Plug-ins)

Vertical-Specific Labs

- Fintech: Streaming Fraud Detection Pipeline

- Healthcare: Clinical Analytics with Unity Catalog Security

- Media & Entertainment: Content Recommendation Model Deployment

6Governance: Unity Catalog and Enterprise Operations

Unity Catalog: Centralized Metastore Across Workspaces
Fine-Grained Access Control: Row and Column-Level Security
Data Lineage: Impact Analysis and Compliance Reporting
Cost Management: Cluster Policies and Usage Monitoring

Ready to accelerate your team's innovation?

Schedule a meeting

Unlock your team’s potential and get the most from your tech stack

Schedule a meeting