Course 450:
Learning Apache Kafka

(2 days)

 

Course Description

Apache Kafka is used to build efficient streaming data applications which can be easily integrated with other Big Data tools such as Apache Hadoop and Apache Spark. This course explains how to use Kafka efficiently, and presents practical solutions to the common problems that developers and administrators usually face while working with it. The course starts with an architectural overview of Apache Kafka and introduces the related concepts. Next, the course introduces Producer API, focusing on how it allows an application to publish a stream of records to one or more Kafka topics. It is followed by Consumer API. The course explores how Consumer API can be used to allow an application to subscribe to one or more topics. Once these basic concepts are discussed, advanced concepts, such as serializing data using Avro and the partitioning of the data based on custom logic, are taught. Finally, best practices of Kafka, along with some practical issues, are discussed.

Learning Objectives

After successfully completing this course, students will be able to:

  • Describe the architecture of Kafka
  • Explore Kafka producers and consumers for writing and reading messages
  • Understand publish-subscribe messaging and how it fits in the Big Data ecosystem
  • Explore how Kafka’s stream delivery capabilities make it a perfect source for stream processing systems
  • Learn various strategies of monitoring Kafka
  • Get best practices for building data pipelines and applications with Kafka
  • Learn how to run Kafka as a cluster on one or more servers that can span multiple data centers

Prerequisites

Background in at least one of the programming languages like Java, Python, or Scala is recommended. A general understanding of streaming applications and distributed computing will be helpful, but not required.

Who Should Attend

This course is intended for anyone wanting to understand the working of Apache Kafka This is a hands-on course. The exercises are intended to give the participants first-hand experience with developing Big Data applications.

Hands-On Labs

The hands-on labs are a key learning element of this course. Each lab reinforces the material presented in lecture.


Course Outline

Chapter 1: Kafka Architecture

  • Kafka’s Origin
  • Installing Kafka
    • Installing Java and Zookeeper
    • Installing a Kafka Broker
    • Broker Configuration
    • Hardware Selection
    • Different Versions of Kafka
  • The New Kafka Architecture (Without the Zookeeper)
  • How to Do Migrations?
  • Exercise 1: Working with Multiple Producers and Consumers

Chapter 2: Producers and Consumers

  • Sending Events to Kafka – Producer API
  • Asynchronous Send
  • Reading Events from Kafka – Consumer API
  • Broker Configurations:
  • Exercise 2: Creating Multiple Brokers and Checking How Messages in Topics Will Be Routed to the Brokers

Chapter 3: Advanced Kafka

  • Kafka Producer API
  • Exercise 3: Writing a Custom Kafka Producer and Understanding What a ProducerRecord Is
  • Working with a Custom Kafka Consumer API
  • Exercise 4: Writing a Custom Kafka Consumer and Understanding What a ConsumerRecord Is
  • Consumer Pool Loop – Offset Management
  • Rebalancing of Consumers

Chapter 4: Kafka Serialization with Avro

  • Exercise 5: How to Serialize Data Using Avro
  • Serializers
  • How to Implement Custom Serializers
  • Serializing Using Apache Avro
  • Using Avro Records with Kafka

Chapter 5: Understanding Internals

  • Electing Partition Leaders – Kafka Controller Component
  • Benefits of Data Partitioning Among Brokers
  • Partitioning of Topics – Implementing Custom Practitioner
  • Writing Custom Partitioner for Specific Partitioning
  • Data Replication in Kafka
  • Append-Only Distribution Log – Storing Events in Kafka
  • Compaction Process
  • Exercise 6: Writing a Custom Partitioner and Checking How Messages Get Partitioned Based on the Custom Logic of the Partitioner
  • Customized Offset Management in Kafka
  • Exercise 7: Writing Code for Getting a Specific Offset of a Message

Chapter 6: Monitoring Kafka – Best Practices

  • Broker Health Monitoring
  • Kafka and Metrics Reporters
  • Monitor Under-Replicated Partitions
  • Monitor Events
  • Performance Tuning
  • Exercise 8: How to Check for Metrics in Kafka

 Chapter 7: Design Consideration and Best Practices of Kafka

  • Practical Use Cases
  • Practical Considerations

Appendix A: More on Kafka

  • Tuning and Optimizing Payload/Timeouts Configurations
  • Partition Sizing as Per Your Data Volume
  • Error Handling/Exactly – Once/Guaranteed Delivery
  • Handling Rebalancing

Appendix B: Exploring the Kafka Ecosystem and Its Future

  • Kafka Streams API (KTables and KStreams)

Please Contact Your ROI Representative to Discuss Course Tailoring!