09-26, 11:50–12:25 (Europe/Amsterdam), Nebula
Most of us start with Kafka by building a simple producer/consumer demo. It just works — until it doesn’t. Suddenly, disk space isn’t freed up after data “expires,” rebalances loop endlessly during deploys, and strange errors about missing leaders clog your logs.
In the panic, we dive into Kafka’s ocean of config options — hoping something will stick. Sound familiar?
This talk is a collection of hard-won lessons — not flashy tricks, but the kind of insights you only gain after operating Kafka in production for years. You’ll walk away with mental models that make Kafka’s internal behavior more predictable and less surprising.
We’ll cover:
- Storage internals: Why expired data doesn’t always free space — and how Kafka actually reclaims disk
- Transactions & delivery semantics: What “exactly-once” really means, and when it silently downgrades
- Consumer group rebalancing: Why rebalances loop, and how the controller’s hidden behavior affects them
If you’ve used Kafka — or plan to — these insights will save you hours of frustration and debugging.
A basic understanding of partitions, replication, and Kafka’s general architecture will help get the most out of this session.
This talk offers a structured walkthrough of three core Kafka internals that often surprise engineers when systems hit production scale. These aren’t edge cases — they’re core design behaviors that often catch teams off guard simply because they’re not part of the typical tutorial path.
Drawing from real experience operating Kafka in production across several high-throughput systems, I’ll walk through these areas:
- Kafka’s Storage Architecture: Kafka stores raw bytes — and that simplicity hides surprising complexity. We’ll explore how logs are segmented and indexed, how data expiration really works, and why retention settings don’t always behave as expected.
- Transactions and Delivery Guarantees: we’ll unpack “exactly-once” semantics: how Kafka enforces them through idempotence and transactional logs, and where the promise breaks down. This section will demystify common delivery issues in high-throughput environments.
- Consumer Groups and the Controller: we’ll examine how rebalancing works under the hood, what role the controller plays in orchestration, and why controller failover can trigger hard-to-diagnose issues in consumer group stability.
This talk is for data and platform engineers using Kafka in production who want a deeper, more practical understanding of its internal mechanics. While it’s not an exhaustive deep dive, it’s designed to provide clarity on the design decisions that shape real-world Kafka behavior.
Time Breakdown
- 0–3 min – Intro: What “just works” hides under the surface
- 3–10 min – Kafka storage architecture: Segments, indexes, and retention surprises
- 10–18 min – Transactions, idempotence, and what “exactly-once” really means
- 18–26 min – Consumer groups, the controller, and the hidden cost of failover
- 26–30 min – Takeaways and Q&A