Using R on Databricks

Learning Track

Specialty

Delivery methods

On-Site, Virtual

Duration

1/2 day

This course shows R developers how to run, scale, and automate R workloads in Databricks. Learners will use sparklyr for distributed data processing, connect from notebooks or IDEs, and explore tools like brickster to streamline development.

Learning Objectives

Understand how R integrates with Databricks
Run R code in notebooks and IDEs
Scale analysis with sparklyr on Databricks clusters
Automate R jobs and workflows
Apply best practices for performance and migration

Audience

R developers wanting to scale analysis
Data scientists bringing R into Databricks workflows

Prerequisites

Basic R knowledge
No Spark or Databricks experience needed

Course outline

1R on Databricks Overview

R support in Databricks
sparklyr (preferred) vs. SparkR (deprecated)
New tools for R users: brickster, connectors, docs

2Running R in Notebooks and IDEs

Connecting to Databricks clusters
Distributed transforms with familiar dplyr syntax
Handling large datasets efficiently

3Developer Tools and Integration

Managing clusters and jobs with brickster
Importing R code, using packages, and version control
MLflow tracking with R

4Automating Jobs and Workflows

Scheduling R notebooks as jobs
Orchestrating workflows with R + Python/SQL
Monitoring and logging R jobs

5Best Practices

Migrating from SparkR to sparklyr
Performance tuning tips
Limitations and workarounds

Ready to accelerate your team's innovation?

Schedule a meeting

Unlock your team’s potential and get the most from your tech stack