Generative AI Application Evaluation and Governance

Learning Track

Generative AI

Delivery methods

On-Site, Virtual

Duration

1/2 day

This is your introduction to evaluating and governing generative AI systems. First, you’ll explore the meaning behind and motivation for building evaluation and governance/security systems. Next, we’ll connect evaluation and governance systems to the Databricks Data Intelligence Platform. Third, we’ll teach you about a variety of evaluation techniques for specific components and types of applications. Finally, the course will conclude with an analysis of evaluating entire AI systems with respect to performance and cost.

Objectives

Explain the meaning behind and motivation for building evaluation and governance/security systems.
Explain Databricks Data Intelligence Platform features for LLM evaluation and governance.
Describe evaluation techniques for specific components and types of applications.
Analyze entire AI systems with respect to performance and cost.

Prerequisites

Familiarity with natural language processing concepts
Familiarity with prompt engineering/prompt engineering best practices
Familiarity with the Databricks Data Intelligence Platform
Familiarity with RAG (preparing data, building a RAG architecture, concepts like embedding, vectors, vector databases, etc.)
Experience with building LLM applications using multi-stage reasoning LLM chains and agents

Course outline

1Importance of Evaluating GenAI Applications

Why to Evaluate GenAI Applications
Exploring Licensing of Datasets
Prompts and Guardrails Basics
Implementing and Testing Guardrails for LLMs

2Securing and Governing GenAI Applications

AI System Security
Implementing AI Guardrails

3GenAI Evaluation Techniques

Evaluation Techniques
Benchmark Evaluation
LLM-as-a-Judge
Domain-Specific Evaluation

4End-to-end Application Evaluation

AI System Architecture
Custom Metrics
Offline vs. Online Evaluation

Ready to accelerate your team's innovation?

Schedule a meeting

Unlock your team’s potential and get the most from your tech stack

Schedule a meeting