• Databricks
  • Data Engineering

Running Databricks on AWS

Contact us to book this course
Learning Track icon
Learning Track

Data Engineering

Delivery methods icon
Delivery methods

On-Site, Virtual

Duration icon
Duration

1 day

This course is ideal for data engineers looking to master the Databricks Lakehouse platform on AWS, as well as cloud engineers and architects specializing in AWS who need to effectively integrate and manage Databricks within their cloud infrastructure. It bridges the gap between data processing and cloud deployment, providing practical skills for building and maintaining modern data solutions.

Prerequisites

  • Databricks Data Engineering Associate Level of understanding
  • Some experience administering and/or deploying workloads on AWS

 

Course outline

  • Introduction to the Databricks Lakehouse Platform
  • AWS Essentials for Databricks
  • Workspace Deployment within a VPC
  • Integrating with S3 for Storage
  • Establishing Trust with IAM Roles 
  • Exercise: Deploy a workspace to a customer managed VPC
  • Databricks Cluster Types and EC2 Instances
  • Autoscaling and Spot Instances
  • Exercise: Deploying a cluster with Spot Instances
  • Resource Organization with Tagging
  • Secure Access with Instance Profiles
  • Exercise: Connecting to AWS Textract service from within a cluster
  • Databricks Integration with S3
  • How Unity Catalog gives clusters permissions to S3
  • Exercise: Creating a new S3 bucket and setting up an External Location
  • Deep Dive into Delta Lake
  • Understanding CDC Storage Costs
  • Monitoring with Amazon CloudWatch
  • Analyzing Databricks Logs
  • Leveraging the Spark UI
  • Using all three for debugging bottlenecks
  • Exercise: Creating a performance analysis Dashboard

Ready to accelerate your team's innovation?