Generative AI Application Evaluation and Governance
Contact us to book this course
Learning Track
Generative AI
Delivery methods
On-Site, Virtual
Duration
1/2 day
This is your introduction to evaluating and governing generative AI systems. First, you’ll explore the meaning behind and motivation for building evaluation and governance/security systems. Next, we’ll connect evaluation and governance systems to the Databricks Data Intelligence Platform. Third, we’ll teach you about a variety of evaluation techniques for specific components and types of applications. Finally, the course will conclude with an analysis of evaluating entire AI systems with respect to performance and cost.
Objectives
- Explain the meaning behind and motivation for building evaluation and governance/security systems.
- Explain Databricks Data Intelligence Platform features for LLM evaluation and governance.
- Describe evaluation techniques for specific components and types of applications.
- Analyze entire AI systems with respect to performance and cost.
Prerequisites
- Familiarity with natural language processing concepts
- Familiarity with prompt engineering/prompt engineering best practices
- Familiarity with the Databricks Data Intelligence Platform
- Familiarity with RAG (preparing data, building a RAG architecture, concepts like embedding, vectors, vector databases, etc.)
- Experience with building LLM applications using multi-stage reasoning LLM chains and agents
Course outline
- Why to Evaluate GenAI Applications
- Exploring Licensing of Datasets
- Prompts and Guardrails Basics
- Implementing and Testing Guardrails for LLMs
- AI System Security
- Implementing AI Guardrails
- Evaluation Techniques
- Benchmark Evaluation
- LLM-as-a-Judge
- Domain-Specific Evaluation
- AI System Architecture
- Custom Metrics
- Offline vs. Online Evaluation