Migrating from Amazon Redshift to Databricks
Contact us to book this courseData Engineering
On-Site, Virtual
1 day
Migrating from Amazon Redshift to Databricks is a pivotal move toward a modern data ecosystem, enabling greater scalability, better performance, lower costs, and advanced data analytics use cases such as machine learning.
This course walks you through each step of the migration process—from profiling your existing Redshift environment to migrating data, pipelines, and queries, and ultimately integrating BI and governance tools on Databricks. With hands-on labs, participants gain practical experience in each major migration phase, ensuring a smooth transition to the Databricks Lakehouse.
Course objectives
- Understand the phases and approaches for migrating from Amazon Redshift to Databricks
- Plan and execute data offloads from Redshift into Delta Lake
- Refactor and schedule data pipelines in Databricks
- Convert existing Redshift SQL queries, stored procedures, and UDFs to Databricks SQL/Spark SQL
- Integrate downstream BI tools and apply governance best practices
Audience
Data engineers, ETL developers, solution architects, and BI professionals looking to modernize their enterprise data warehouse by moving from Redshift to Databricks.
Prerequisites
- Basic understanding of data warehousing concepts (e.g., schemas, ETL, BI)
- Familiarity with SQL
- Some exposure to cloud data platforms (AWS, Databricks, etc.)
Course outline
- Recognize the differences between traditional data warehousing and Databricks Lakehouse
- Understand common drivers behind Redshift-to-Databricks migrations
- Identify key features of Databricks vs. Amazon Redshift
- Preview the 5-phase migration framework
- Compare big-bang vs. phased migrations
- Determine whether an ETL-first or BI-first approach suits your organization
- Conduct initial discovery and workload assessment
- Map Redshift features to Databricks capabilities
- Lab: Profiler assessing Redshift usage
- Convert Redshift schemas (DDL) to Databricks-compatible DDL
- Choose efficient methods to extract data from Redshift (UNLOAD, connectors)
- Load data into Delta Lake and validate row counts and checksums
- Understand the medallion architecture (Bronze, Silver, Gold)
- Lab: Migrating Data from Redshift to Databricks
- Refactor Redshift-based ETL jobs to Databricks Workflows
- Understand Delta Lake performance optimizations (partitioning, Z-ordering)
- Integrate external orchestration tools (Airflow, Glue) or use Databricks Workflows
- Lab Running a Data Pipeline in Databricks
- Identify and resolve common SQL syntax differences between Redshift and Databricks
- Automate conversion using tools like BladeBridge or handle manually for edge cases
- Perform Manual conversions for edge cases
- Optimize migrated queries using Spark/Photon features
- Understand strategies for stored procedures and UDF migration
- Demo: Converting Queries from Redshift to Spark SQL
- Configure Databricks SQL Warehouses for BI consumption
- Repoint or “switch over” dashboards (Tableau, Power BI, Looker, etc.)
- Validate schemas and access controls for your BI layer
- Ensure concurrency and performance for high-volume BI workloads
- Implement cluster management strategies (autoscaling, ephemeral jobs, Cluster types)
- Optimize Delta Lake (OPTIMIZE, VACUUM, Analyze Table)
- Apply governance with Unity Catalog (fine-grained permissions, lineage)
- Monitor usage and costs effectively
- Typical pitfalls and solutions
- Use cases - real world examples
- Recap and Key Takeaways
- Further Learning (Databricks Academy, docs, partner tools)
- Professional Services and Partner Ecosystem
- Open Q&A Session