Using R on Databricks
Contact us to book this course
Learning Track
Specialty
Delivery methods
On-Site, Virtual
Duration
1/2 day
This course shows R developers how to run, scale, and automate R workloads in Databricks. Learners will use sparklyr for distributed data processing, connect from notebooks or IDEs, and explore tools like brickster to streamline development.
Learning Objectives
- Understand how R integrates with Databricks
- Run R code in notebooks and IDEs
- Scale analysis with sparklyr on Databricks clusters
- Automate R jobs and workflows
- Apply best practices for performance and migration
Audience
- R developers wanting to scale analysis
- Data scientists bringing R into Databricks workflows
Prerequisites
- Basic R knowledge
- No Spark or Databricks experience needed
Course outline
- R support in Databricks
- sparklyr (preferred) vs. SparkR (deprecated)
- New tools for R users: brickster, connectors, docs
- Connecting to Databricks clusters
- Distributed transforms with familiar dplyr syntax
- Handling large datasets efficiently
- Managing clusters and jobs with brickster
- Importing R code, using packages, and version control
- MLflow tracking with R
- Scheduling R notebooks as jobs
- Orchestrating workflows with R + Python/SQL
- Monitoring and logging R jobs
- Migrating from SparkR to sparklyr
- Performance tuning tips
- Limitations and workarounds