• Databricks
  • Specialty

Using R on Databricks

Contact us to book this course
Learning Track icon
Learning Track

Specialty

Delivery methods icon
Delivery methods

On-Site, Virtual

Duration icon
Duration

1/2 day

This course shows R developers how to run, scale, and automate R workloads in Databricks. Learners will use sparklyr for distributed data processing, connect from notebooks or IDEs, and explore tools like brickster to streamline development.

Learning Objectives

  • Understand how R integrates with Databricks
  • Run R code in notebooks and IDEs
  • Scale analysis with sparklyr on Databricks clusters
  • Automate R jobs and workflows
  • Apply best practices for performance and migration

Audience

  • R developers wanting to scale analysis
  • Data scientists bringing R into Databricks workflows

Prerequisites

  • Basic R knowledge
  • No Spark or Databricks experience needed

Course outline

  • R support in Databricks
  • sparklyr (preferred) vs. SparkR (deprecated)
  • New tools for R users: brickster, connectors, docs
  • Connecting to Databricks clusters
  • Distributed transforms with familiar dplyr syntax
  • Handling large datasets efficiently
  • Managing clusters and jobs with brickster
  • Importing R code, using packages, and version control
  • MLflow tracking with R
  • Scheduling R notebooks as jobs
  • Orchestrating workflows with R + Python/SQL
  • Monitoring and logging R jobs
  • Migrating from SparkR to sparklyr
  • Performance tuning tips
  • Limitations and workarounds

Ready to accelerate your team's innovation?