Migrating from Amazon Redshift to Databricks

Learning Track

Data Engineering

Delivery methods

On-Site, Virtual

Duration

1 day

Migrating from Amazon Redshift to Databricks is a pivotal move toward a modern data ecosystem, enabling greater scalability, better performance, lower costs, and advanced data analytics use cases such as machine learning.

This course walks you through each step of the migration process—from profiling your existing Redshift environment to migrating data, pipelines, and queries, and ultimately integrating BI and governance tools on Databricks. With hands-on labs, participants gain practical experience in each major migration phase, ensuring a smooth transition to the Databricks Lakehouse.

Course objectives

Understand the phases and approaches for migrating from Amazon Redshift to Databricks
Plan and execute data offloads from Redshift into Delta Lake
Refactor and schedule data pipelines in Databricks
Convert existing Redshift SQL queries, stored procedures, and UDFs to Databricks SQL/Spark SQL
Integrate downstream BI tools and apply governance best practices

Audience

Data engineers, ETL developers, solution architects, and BI professionals looking to modernize their enterprise data warehouse by moving from Redshift to Databricks.

Prerequisites

Basic understanding of data warehousing concepts (e.g., schemas, ETL, BI)
Familiarity with SQL
Some exposure to cloud data platforms (AWS, Databricks, etc.)

Course outline

1Introduction and Overview

Recognize the differences between traditional data warehousing and Databricks Lakehouse
Understand common drivers behind Redshift-to-Databricks migrations
Identify key features of Databricks vs. Amazon Redshift
Preview the 5-phase migration framework

2Migration Strategy and Planning

Compare big-bang vs. phased migrations
Determine whether an ETL-first or BI-first approach suits your organization
Conduct initial discovery and workload assessment
Map Redshift features to Databricks capabilities
Lab: Profiler assessing Redshift usage

3Data Migration

Convert Redshift schemas (DDL) to Databricks-compatible DDL
Choose efficient methods to extract data from Redshift (UNLOAD, connectors)
Load data into Delta Lake and validate row counts and checksums
Understand the medallion architecture (Bronze, Silver, Gold)
Lab: Migrating Data from Redshift to Databricks

4Data Pipeline Migration

Refactor Redshift-based ETL jobs to Databricks Workflows
Understand Delta Lake performance optimizations (partitioning, Z-ordering)
Integrate external orchestration tools (Airflow, Glue) or use Databricks Workflows
Lab Running a Data Pipeline in Databricks

5Query Conversion and Refactoring

Identify and resolve common SQL syntax differences between Redshift and Databricks
Automate conversion using tools like BladeBridge or handle manually for edge cases
Perform Manual conversions for edge cases
Optimize migrated queries using Spark/Photon features
Understand strategies for stored procedures and UDF migration
Demo: Converting Queries from Redshift to Spark SQL

6Downstream Tools Integration

Configure Databricks SQL Warehouses for BI consumption
Repoint or “switch over” dashboards (Tableau, Power BI, Looker, etc.)
Validate schemas and access controls for your BI layer
Ensure concurrency and performance for high-volume BI workloads

7Best Practices and Governance

Implement cluster management strategies (autoscaling, ephemeral jobs, Cluster types)
Optimize Delta Lake (OPTIMIZE, VACUUM, Analyze Table)
Apply governance with Unity Catalog (fine-grained permissions, lineage)
Monitor usage and costs effectively
Typical pitfalls and solutions

8Q&A, Next Steps, and Additional Resources