Dima Baranetskyi PyData Amsterdam 2025

Dima Baranetskyi
.ical

Dima Baranetskyi is a Technical Lead and Senior Data Engineering Consultant with a background in software and data engineering. He currently leads Python development in the financial sector and has designed and built production-grade streaming architectures in domains like energy, e-commerce, and education. His work focuses on event-driven systems, pragmatic data tooling, and making distributed systems understandable and maintainable. Dima is certified in Apache Kafka and Kubernetes and prefers practical, right-sized solutions over theoretical complexity.

Session

09-26

11:50

35min

Kafka Internals I Wish I Knew Sooner: The Non-Boring Truths

Dima Baranetskyi

Most of us start with Kafka by building a simple producer/consumer demo. It just works — until it doesn’t. Suddenly, disk space isn’t freed up after data “expires,” rebalances loop endlessly during deploys, and strange errors about missing leaders clog your logs.
In the panic, we dive into Kafka’s ocean of config options — hoping something will stick. Sound familiar?

This talk is a collection of hard-won lessons — not flashy tricks, but the kind of insights you only gain after operating Kafka in production for years. You’ll walk away with mental models that make Kafka’s internal behavior more predictable and less surprising.

We’ll cover:
- Storage internals: Why expired data doesn’t always free space — and how Kafka actually reclaims disk
- Transactions & delivery semantics: What “exactly-once” really means, and when it silently downgrades
- Consumer group rebalancing: Why rebalances loop, and how the controller’s hidden behavior affects them

If you’ve used Kafka — or plan to — these insights will save you hours of frustration and debugging.
A basic understanding of partitions, replication, and Kafka’s general architecture will help get the most out of this session.

Nebula

Dima Baranetskyi .ical

Session

Dima Baranetskyi
.ical