PyData Global 2025

Shekhar Prasad Rajak

Passionate Open Source Advocate and Software Engineer at Apple.
Shekhar is a seasoned open-source developer and advocate, with contributions to SymPy, NumPy, SciPy, Bundler, and as the author of daru and daru-view in the SciRuby ecosystem. A two-time GSoC alumnus (2016,17) and former SciRuby org admin, he has mentored across multiple open-source communities. He has spoken at leading conferences, including RubyConf, PyCon, ApacheCon, and Community Over Code. Currently, he is a Software Development Engineer at Apple, driving innovation in software engineering.


Session

12-11
13:30
30min
Streaming AI Workflows in Python: Kafka Queues and Flink-Powered LLM Inference
Shekhar Prasad Rajak, bhrathjatoth

Python users working on real-time analytics—from payment processing and fraud detection to AI-driven support—rely on message queues to keep data moving reliably and efficiently. Traditional message queues, however, can struggle with large-scale, concurrent workloads, especially when you need durability and replayability.

In this session, we’ll show how Kafka 4.0 introduces robust queue semantics to distributed streaming, empowering Python applications to handle fair, concurrent, and isolated message processing at scale—using familiar Kafka Python clients and frameworks.

But the power lies in what you can build next. We’ll demonstrate how Apache Flink can connect Kafka event streams to real-time Large Language Model (LLM) inference for tasks like sentiment analysis and summarization, all orchestrated via Python APIs and remote model endpoints for powerful, flexible AI inference.

To complete the picture, we’ll cover how enriched results can be stored in popular data lake solutions—such as Apache Iceberg—enabling long-term analytics, time travel, and integration with downstream data science workflows. Support for Iceberg and other lakehouse formats is optional, giving you flexibility to choose the right data backend for your needs.

Machine Learning & AI
Machine Learning & AI