AWS Analytics on Databricks

Contact us to book this course
Delivery methods icon
Delivery methods

On-Site, Virtual

Duration icon
Duration

1 day

This comprehensive, vertical-specific training demonstrates Databricks' strategic value for building modern data platforms on AWS, from batch ETL to real-time ML inference. Participants will learn to implement the Lakehouse architecture using Delta Lake for ACID transactions on S3, eliminating the traditional data warehouse vs. data lake trade-off. The curriculum emphasizes the Medallion Architecture (Bronze, Silver, Gold) for organized data transformation and Unity Catalog for enterprise governance. By day's end, students will be able to justify unified analytics ROI and implement production data pipelines with proper lineage tracking, access control, and cost optimization.

Learning Objectives

  • Articulate the Databricks on AWS value proposition by comparing the TCO of Lakehouse architecture vs. traditional data warehouse + data lake combinations.

  • Deploy and configure Databricks workspaces on AWS with proper VPC networking, IAM instance profiles, and PrivateLink connectivity for secure data access.

  • Master Delta Lake Operations using ACID transactions, time travel, and schema evolution to build reliable data pipelines on Amazon S3.

  • Optimize Compute Costs by implementing auto-termination policies, spot instance strategies, and right-sizing clusters for different workload types.

  • Implement MLOps Best Practices using MLflow for experiment tracking, model registry, and production serving endpoints on AWS infrastructure.

  • Accelerate Data Engineering by leveraging Delta Live Tables for declarative pipeline development with built-in data quality expectations.

  • Execute Business-Aligned Analytics by completing a vertical-specific lab (Fintech, Healthcare, or Media) to solve industry-specific AWS data challenges.

 

Who Should Attend

Data Engineers, Analytics Engineers, and Data Architects responsible for building data platforms on AWS. Previous SQL experience would be helpful but is not required. Prior experience with core AWS services (S3, IAM, VPC) is assumed.

Course outline

  • The "Warehouse vs. Lake" Debate: Why Lakehouse Wins
  • Understanding DBUs: Databricks Unit Economics on AWS
  • Workspace Architecture: Control Plane vs. Data Plane on AWS
  • Demo: Creating a Databricks Workspace with AWS Quick Start  
  • ACID Transactions: Reliability Without the Data Warehouse Tax
  • Time Travel: Query Historical Data and Rollback Mistakes
  • Schema Evolution: Handling Changing Data Without Breaking Pipelines
  • Lab: Converting Parquet to Delta and Implementing Time Travel Queries 
  • Bronze Layer: Raw Ingestion with Auto Loader and Kafka Connect
  • Silver Layer: Cleansing, Deduplication, and Data Quality Rules
  • Gold Layer: Business Aggregations and Dimensional Modeling
  • Demo: Building a Complete Medallion Pipeline with Delta Live Tables  
  • Experiment Tracking: Reproducible ML with Automatic Logging
  • Model Registry: Staging, Production, and Approval Workflows
  • Feature Store: Centralized Feature Engineering and Serving
  • Model Serving: Real-Time Inference Endpoints on AWS
  • Demo: Training and Deploying a Model with MLflow  
Vertical-Specific Labs
    • Fintech: Streaming Fraud Detection Pipeline
    • Healthcare: Clinical Analytics with Unity Catalog Security
    • Media & Entertainment: Content Recommendation Model Deployment
  • Unity Catalog: Centralized Metastore Across Workspaces
  • Fine-Grained Access Control: Row and Column-Level Security
  • Data Lineage: Impact Analysis and Compliance Reporting
  • Cost Management: Cluster Policies and Usage Monitoring

Ready to accelerate your team's innovation?