• ROI Training

SRE Foundations

Contact us to book this course
Curriculum icon
Curriculum

Cloud Computing

Delivery methods icon
Delivery methods

On-Site, Virtual

Duration icon
Duration

1 day

Site reliability engineering (SRE) is a software engineering approach to IT infrastructure and operations that aligns incentives between development and operations and includes mission-critical production support. This course starts with an introduction to the main practices of SRE and the role IT and business leaders play in the success of SRE adoption. The course then introduces participants to how Service Level Indicators (SLIs), Service Level Objectives (SLOs), and error budgets should be used to measure a service’s reliability. Attendees will gain experience with creating these measures in practice. These concepts help create a culture where the reliability and success of a service can be objectively measured.

Learning Objectives

  • Articulate the technical and cultural fundamentals of Site Reliability Engineering (SRE) and understand the value they can provide to your IT operations in any environment
  • Learn SRE terminology
  • Understand why services need Service Level Objectives (SLOs)
  • Achieve developer and operation harmony with error budgets
  • Choose appropriate Service Level Indicators (SLIs) based on user journeys
  • Create specific, measurable, achievable, relevant, and time-bound SLOs
  • Explore techniques to reduce toil for development, testing, and operations
  • Review containerization and microservice architecture

Who Should Attend

This course is aimed at development and operations engineers and technical managers, but can also be useful for product and business leaders wanting to learn more about what SRE is.


Activities

This course contains several design activities to provide real-life experience with creating these SRE measures in practice. These concepts help create a culture where the reliability and success of a service can be objectively measured. 

  • Improving IT Operations
  • Moving to an SRE Culture
  • DORA DevOps Quick Check
  • SLI Failure Gaps
  • Define SLI and SLO Targets

Course outline

  • Production Systems
  • Reduce Organizational Silos
  • Improving IT Operations
  • What Is DevOps? DevSecOps? SRE?
  • Review the Relationship Between DevOps and SRE
  • Shift Left on Security
  • Moving to an SRE Culture
  • Apply SRE in Your Organization
  • Review SRE Terminology
  • SLIs, SLOs, SLAs, Error Budgets
  • Recognize Why Services Need SLOs
  • Incentivizing Reliability Across DevOps Teams with Solid SRE Practices
  • Choose Appropriate SLIs Based on User Journeys
  • Create SMART (Specific, Measurable, Achievable, Relevant, and Time-bound) SLOs
  • Calculating and Leveraging Error Budgets
  • Define Toil
  • Recognize Toil for Developers, Testing, and Operations
  • Leverage Automated Tools, Builds, and Testing to Eliminate Toil
  • Reducing Developer Toil with Source Code Management (SCM)
  • Reducing Operations Toil with Infrastructure as Code (IaC)?
  • Containerization vs. Virtual Machines
  • Advantages of Containers
  • Building Container Images: Docker
  • Monolithic vs. Microservice Applications
  • Understanding Container Orchestration: Kubernetes

Ready to accelerate your team's innovation?