PyData Berlin 2025

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
08:00
08:00
60min
Registration & Coffee
Kuppelsaal
08:00
60min
Registration & Coffee
B09
08:00
60min
Registration & Coffee
B07-B08
08:00
60min
Registration & Coffee
B05-B06
09:00
09:00
20min
Opening Session

Opening Session for PyData Berlin 2025

Kuppelsaal
09:20
09:20
50min
PyData 2077: a data science future retrospective
Andy Kitchen, Laura Summers

From: Chrono-Regulatory Commission, Temporal Enforcement Division
To: PyData Berlin Organising Committee
Subject: Citation #TMP-2077-091 - Unauthorised Spacetime Disturbance

Dear Committee,
Our temporal monitoring systems have detected an unauthorised chronological anomaly emanating from your facility (Berliner Congress Center, coordinates 52.52068°N, 13.416451°E) scheduled to manifest on September 1st at 9:20 a.m.

Education, Career & Life
Kuppelsaal
10:10
10:10
30min
Coffee Break
Kuppelsaal
10:10
30min
Coffee Break
B09
10:10
30min
Coffee Break
B07-B08
10:10
30min
Coffee Break
B05-B06
10:40
10:40
90min
A Beginner's Guide to State Space Modeling
Alexandre Andorra, Jesse Grabowski

State Space Models (SSMs) are powerful tools for time series analysis, widely used in finance, economics, ecology, and engineering. They allow researchers to encode structural behavior into time series models, including trends, seasonality, autoregression, and irregular fluctuations, to name just a few. Many workhorse time series models, including ARIMA, VAR, and ETS, are special cases of the general statespace framework.

In this practical, hands-on tutorial, attendees will learn how to leverage PyMC's new state-space modeling capabilities (pymc_extras.statespace) to build, fit, and interpret Bayesian state space models.

Starting from fundamental concepts, we'll explore several real-world use cases, demonstrating how SSMs help tackle common time series challenges, such as handling missing observations, integrating external regressors, and generating forecasts.

PyData & Scientific Libraries Stack
B09
10:40
30min
Beyond Linear Funnels: Visualizing Conditional User Journeys with Python
Yaseen Esmaeelpour

Optimizing user funnels is a common task for data analysts and data scientists. Funnels are not always linear in the real world. often, the next step depends on earlier responses or actions. This results in complex funnels that can be tricky to analyze. I’ll introduce an open-source Python library I developed that analyzes and visualizes non-linear, conditional funnels by utilizing Graphviz and Streamlit. It calculates conversion rates, drop-offs, time spent on each step, and highlights bottlenecks by color. Attendees will learn about how to quickly explore complex user journeys and generate insightful funnel data.

Visualisation & Jupyter
B07-B08
10:40
30min
🛰️➡️🧑‍💻: Streamlining Satellite Data for Analysis-Ready Outputs
Vinayak Nair

I will share how our team built an end-to-end system to transform raw satellite imagery into analysis-ready datasets for use cases like vegetation monitoring, deforestation detection, and identifying third-party activity. We streamlined the entire pipeline from automated acquisition and cloud storage to preprocessing that ensures spatial, spectral, and temporal consistency. By leveraging Prefect for orchestration, Anyscale Ray for scalable processing, and the open source STAC standard for metadata indexing, we reduced processing times from days to near real-time. We addressed challenges like inconsistent metadata and diverse sensor types, building a flexible system capable of supporting large-scale geospatial analytics and AI workloads.

Data Handling & Engineering
B05-B06
11:20
11:20
30min
Democratizing Digital Maps: How Protomaps Changes the Game
Veit Schiele

Digital mapping has long been dominated by commercial providers, creating barriers of cost, complexity, and privacy concerns. This talk introduces Protomaps, an open-source project that reimagines how web maps are delivered and consumed. Using the innovative PMTiles format – a single-file approach to vector tiles – Protomaps eliminates complex server infrastructure while reducing bandwidth usage and improving performance. We'll explore how this technology democratizes cartography by making self-hosted maps accessible without API keys, usage quotas, or recurring costs. The presentation will demonstrate implementations with Leaflet and MapLibre, showcase customization options, and highlight cases where Protomaps enables privacy-conscious, offline-capable mapping solutions. Discover how this technology puts mapping control back in the hands of developers while maintaining the rich experiences modern applications demand.

Visualisation & Jupyter
B07-B08
11:20
30min
Exploring Millions of High-dimensional Datapoints in the Browser for Early Drug Discovery
Tim Tenckhoff, Matthias Orlowski

The visual exploration of large, high-dimensional datasets presents significant challenges in data processing, transfer, and rendering for engineering in various industries. This talk will explore innovative approaches to harnessing massive datasets for early drug discovery, with a focus on interactive visualizations. We will demonstrate how our team at Bayer utilizes a modern tech stack to efficiently navigate and analyze millions of data points in a high-dimensional embedding space. Attendees will gain insights into overcoming performance challenges, optimizing data rendering, and developing user-friendly tools for effective data exploration. We aim to demonstrate how these technologies can transform the way we interact with complex datasets in engineering applications and eventually allow us to find the needle in a multidimensional haystack.

Data Handling & Engineering
B05-B06
12:00
12:00
30min
Accessible Data Visualizations
Maris Nieuwenhuis

Data visualizations often exclude users with visual impairments and temporary or situational constraints. Many regulations (European Accessibility Act, American Disabilities Act) now mandate inclusive digital content. Our research provides practical solutions — optimized color palettes, supplementary patterns, and alternative formats — implemented in popular libraries like Bokeh and Vega-Altair. These techniques, available through our open-source cusy Design System, create visualizations that reach broader audiences while meeting compliance requirements and improving comprehension for all users.

Visualisation & Jupyter
B07-B08
12:00
30min
Democratizing Experimentation: How GetYourGuide Built a Flexible and Scalable A/B Testing Platform
Konrad Richter

At GetYourGuide, we transformed experimentation from a centralized, closed system into a democratized, self-service platform accessible to all analysts, engineers, and product teams. In this talk, we'll share our journey to empower individuals across the company to define metrics, create dimensions, and easily extend statistical methods. We'll discuss how we built a Python-based Analyzer toolkit enabling standardized, reusable calculations, and how our experimentation platform provides ad-hoc analytical capabilities through a flexible API. Attendees will gain practical insights into creating scalable, maintainable, and user-friendly experimentation infrastructure, along with access to our open-source sequential testing implementation.

Data Handling & Engineering
B05-B06
12:30
12:30
60min
Lunch Break
Kuppelsaal
12:30
60min
Lunch Break
B07-B08
12:30
60min
Lunch Break
B05-B06
12:30
60min
PyLadies & Empowered in Tech Lunch

Join PyLadies & Empowered in Tech for a special lunch event aimed at fostering community. Enjoy meaningful conversations and networking opportunities.

Community & Diversity
B09
13:40
13:40
30min
Automating Content Creation with LLMs: A Journey from Manual to AI-Driven Excellence
Marco Vene

In the fast-paced realm of travel experiences, GetYourGuide encountered the challenge of maintaining consistent, high-quality content across its global marketplace. Manual content creation by suppliers often resulted in inconsistencies and errors, negatively impacting conversion rates. To address this, we leveraged large language models (LLMs) to automate content generation, ensuring uniformity and accuracy. This talk will explore our innovative approach, including the development of fine-tuned models for generating key text sections and the use of Function Calling GPT API for structured data. A pivotal aspect of our solution was the creation of an LLM evaluator to detect and correct hallucinations, thereby improving factual accuracy. Through A/B testing, we demonstrated that AI-driven content led to fewer defects and increased bookings. Attendees will gain insights into training data refinement, prompt engineering, and deploying AI at scale, offering valuable lessons for automating content creation across industries.

Generative AI
B07-B08
13:40
90min
More than DataFrames: Data Pipelines with the Swiss Army Knife DuckDB
Mehdi Ouazza

Most Python developers reach for Pandas or Polars when working with tabular data—but DuckDB offers a powerful alternative that’s more than just another DataFrame library. In this tutorial, you’ll learn how to use DuckDB as an in-process analytical database: building data pipelines, caching datasets, and running complex queries with SQL—all without leaving Python. We’ll cover common use cases like ETL, lightweight data orchestration, and interactive analytics workflows. You’ll leave with a solid mental model for using DuckDB effectively as the “SQLite for analytics.”

Data Handling & Engineering
B09
13:40
30min
The EU AI Act: Unveiling Lesser-Known Aspects, Implementation Entities, and Exemptions
Adrin Jalali

The EU AI Act is already partly in effect which prohibits certain AI systems. After going through the basics, we cover some of the less talked about aspects of the Act, introducing entities involved in its implementation and how many high risk government and law enforcement use cases are excluded!

Ethics & Privacy
B05-B06
14:20
14:20
30min
Benchmarking 2000+ Cloud Servers for GBM Model Training and LLM Inference Speed
Gergely Daroczi

Spare Cores is a Python-based, open-source, and vendor-independent ecosystem collecting, generating, and standardizing comprehensive data on cloud server pricing and performance. In our latest project, we started 2000+ server types across five cloud vendors to evaluate their suitability for serving Large Language Models from 135M to 70B parameters. We tested how efficiently models can be loaded into memory of VRAM, and measured inference speed across varying token lengths for prompt processing and text generation. The published data can help you find the optimal instance type for your LLM serving needs, and we will also share our experiences and challenges with the data collection and insights into general patterns.

Infrastructure - Hardware & Cloud
B07-B08
14:20
30min
What’s Really Going On in Your Model? A Python Guide to Explainable AI
Yashasvi Misra

As machine learning models become more complex, understanding why they make certain predictions is becoming just as important as the predictions themselves. Whether you're dealing with business stakeholders, regulators, or just debugging unexpected results, the ability to explain your model is no longer optional , it's essential.

In this talk, we'll walk through practical tools in the Python ecosystem that help bring transparency to your models, including SHAP, LIME, and Captum. Through hands-on examples, you'll learn how to apply these libraries to real-world models from decision trees to deep neural networks and make sense of what's happening under the hood.

If you've ever struggled to explain your model’s output or justify its decisions, this session will give you a toolkit to build more trustworthy, interpretable systems without sacrificing performance.

Ethics & Privacy
B05-B06
15:10
15:10
30min
Coffee Break
Kuppelsaal
15:10
30min
Coffee Break
B09
15:10
30min
Coffee Break
B07-B08
15:10
30min
Coffee Break
B05-B06
15:40
15:40
90min
AI-Ready Data in Action: Powering Smarter Agents
Violetta Mishechkina, Chang She

This hands-on workshop focuses on what AI engineers do most often: making data AI-ready and turning it into production-useful applications. Together with dltHub and LanceDB, you’ll walk through an end-to-end workflow: collecting and preparing real-world data with best practices, managing it in LanceDB, and powering AI applications with search, filters, hybrid retrieval, and lightweight agents. By the end, you’ll know how to move from raw data to functional, production-ready AI setups without the usual friction. We will touch upon multi-modal data and going to production with this end-to-end use case.

Data Handling & Engineering
B09
15:40
30min
Building Bridges, Not Silos: Lessons from Running a Data & ML/AI Engineering Guild at Vattenfall
Anastasia Karavdina

In large organizations, data and AI talent often work in fragmented teams, making cross-pollination of ideas, tools, and best practices a challenge. At Vattenfall, we addressed this by founding the “Data & ML/AI Engineering Guild”: a cross-functional community dedicated to sharing knowledge, aligning on technical standards, and accelerating innovation across business units.

Community & Diversity
B05-B06
15:40
30min
Scaling Python: An End-to-End ML Pipeline for ISS Anomaly Detection with Kubeflow and MLFlow
Christian Geier

Building and deploying scalable, reproducible machine learning pipelines can be challenging, especially when working with orchestration tools like Slurm or Kubernetes. In this talk, we demonstrate how to create an end-to-end ML pipeline for anomaly detection in International Space Station (ISS) telemetry data using only Python code.

We show how Kubeflow Pipelines, MLFlow, and other open-source tools enable the seamless orchestration of critical steps: distributed preprocessing with Dask, hyperparameter optimization with Katib, distributed training with PyTorch Operator, experiment tracking and monitoring with MLFlow, and scalable model serving with KServe. All these steps are integrated into a holistic Kubeflow pipeline.

By leveraging Kubeflow's Python SDK, we simplify the complexities of Kubernetes configurations while achieving scalable, maintainable, and reproducible pipelines. This session provides practical insights, real-world challenges, and best practices, demonstrating how Python-first workflows empower data scientists to focus on machine learning development rather than infrastructure.

Infrastructure - Hardware & Cloud
B07-B08
16:20
16:20
30min
Beyond the Black Box: Interpreting ML models with SHAP
Avik Basu

As machine learning models become more accurate and complex, explainability remains essential. Explainability helps not just with trust and transparency but also with generating actionable insights and guiding decision-making. One way of interpreting the model outputs is using SHapley Additive exPlanations (SHAP). In this talk, I will go through the concept of Shapley values and its mathematical intuition and then walk through a few real-world examples for different ML models. Attendees will gain a practical understanding of SHAP's strengths and limitations and how to use it to explain model predictions in their projects effectively.

Visualisation & Jupyter
B07-B08
16:20
30min
Consumer Choice Models with PyMC Marketing
Nathaniel Forde

Consumer choice models are an important part of product innovation and market strategy. In this talk we'll see how they can be used to learn about substitution goods and market shares in competitive markets using PyMC marketing's new consumer choice module.

PyData & Scientific Libraries Stack
B05-B06
17:00
17:00
30min
Building an A/B Testing Framework with NiceGUI
Wessel van de Goor

NiceGUI is a Python-based web UI framework that enables developers to build interactive web applications without using JavaScript. In this talk, I’ll share how my team used NiceGUI to create an internal A/B testing platform entirely in Python. I’ll discuss the key requirements for the platform, why we chose NiceGUI, and how it helped us design the UI, display results, and integrate with the backend. This session will demonstrate how NiceGUI simplifies development, reduces frontend complexity, and speeds up internal tool creation for Python developers.

Visualisation & Jupyter
B07-B08
17:00
30min
Risk Budget Optimization for Causal Mix Models
Carlos Trujillo

Traditional budget planners chase the highest predicted return and hope for the best. Bayesian models take the opposite route: they quantify uncertainty first, then let us optimize budgets with that uncertainty fully on display. In this talk we’ll show how posterior distributions become a set of possible futures, and how risk‑aware loss functions convert those probabilities into spend decisions that balance upside with resilience. Whether you lead marketing, finance, or product, you’ll learn a principled workflow for turning probabilistic insight into capital allocation that’s both aggressive and defensible—no black‑box magic, just transparent Bayesian reasoning and disciplined risk management.

PyData & Scientific Libraries Stack
B05-B06
08:30
08:30
30min
Registration & Coffee
Kuppelsaal
08:30
30min
Registration & Coffee
B09
08:30
30min
Registration & Coffee
B07-B08
08:30
30min
Registration & Coffee
B05-B06
09:00
09:00
10min
Opening notes
Kuppelsaal
09:10
09:10
60min
Narwhals: enabling universal dataframe support
Marco Gorelli

Ever tried passing a Polars Dataframe to a data science library and found that it...just works? No errors, no panics, no noticeable overhead, just...results? This is becoming increasingly common in 2025, yet only 2 years ago, it was mostly unheard of. So, what changed? A large part of the answer is: Narwhals.

Narwhals is a lightweight compatibility layer between dataframe libraries which lets your code work seamlessly across Polars, pandas, PySpark, DuckDB, and more! And it's not just a theoretical possibility: with ~30 million monthly downloads and set as a required dependency of Altair, Bokeh, Marimo, Plotly, Shiny, and more, it's clear that it's reshaping the data science landscape. By the end of the talk, you'll understand why writing generic dataframe code was such a headache (and why it isn't anymore), how Narwhals works and how its community operates, and how you can use it in your projects today. The talk will be technical yet accessible and light-hearted.

PyData & Scientific Libraries Stack
Kuppelsaal
10:10
10:10
30min
Coffee Break
Kuppelsaal
10:10
30min
Coffee Break
B09
10:10
30min
Coffee Break
B07-B08
10:10
30min
Coffee Break
B05-B06
10:40
10:40
90min
Probably Fun: Games to teach Machine Learning
Dr. Kristian Rother, Shreyaasri Prakash

In this tutorial, you will play several games that can be used to teach machine learning concepts. Each game can be played in big and small groups. Some involve hands- on material such as cards, some others involve electronic app. All games contain one or more concepts from Machine Learning.

As an outcome, you will take away multiple ideas that make complex topics more understandable – and enjoyable. By doing so, we would like to demonstrate that Machine Learning does not require computers, but the core ideas can be exemplified in a clear and memorable way without. We also would like to demonstrate that gamification is not limited to online quiz questions, but offers ways for learners to bond.

We will bring a set of carefully selected games that have been proven in a big classroom setting and contain useful abstractions of linear models, decision trees, LLMs and several other Machine Learning concepts. We also believe that it is probably fun to participate in this tutorial.

Education, Career & Life
B09
10:40
30min
The Importance and Elegance of Polars Expressions
Jeroen Janssens

Polars is known for its speed, but its elegance comes from its use of expressions. In this talk, we’ll explore how Polars expressions work and why they are key to efficient and elegant data manipulation. Through real-world examples, you’ll learn how to create, expand, and combine expressions in Polars to wrangle data more effectively.

PyData & Scientific Libraries Stack
B05-B06
10:40
30min
Training Specialized Language Models with Less Data: An End-to-End Practical Guide
Jacek Golebiowski

Small Language Models (SLMs) offer an efficient and cost-effective alternative to LLMs—especially when latency, privacy, inference costs or deployment constraints matter. However, training them typically requires large labeled datasets and is time-consuming, even if it isn't your first rodeo.

This talk presents an end-to-end approach for curating high-quality synthetic data using LLMs to train domain-specific SLMs. Using a real-world use case, we’ll demonstrate how to reduce manual labeling time, cut costs, and maintain performance—making SLMs viable for production applications.

Whether you are a seasoned Machine Learning Engineer or a person just getting starting with building AI features, you will come away with the inspiration to build more performant, secure and environmentally-friendly AI systems.

Natural Language Processing & Audio (incl. Generative AI NLP)
B07-B08
11:20
11:20
30min
Causal Inference in Network Structures: Lessons learned From Financial Services
Danial Senejohnny

Causal inference techniques are crucial to understanding the impact of actions on outcomes. This talk shares lessons learned from applying these techniques in real-world scenarios where standard methods do not immediately apply. Our key question is: What is the causal impact of wealth planning services on a network of individual’s investments and securities? We'll examine the challenges posed by practical constraints and show how to deal with them before applying standard approaches like staggered difference-in-difference.

This self-contained talk is prepared for general data scientists who want to add causal inference techniques to their toolbox and learn from real-world data challenges.

PyData & Scientific Libraries Stack
B05-B06
11:20
30min
Most AI Agents Are Useless. Let’s Fix That
Bilge Yücel

AI agents are having a moment, but most of them are little more than fragile prototypes that break under pressure. Together, we’ll explore why so many agentic systems fail in practice, and how to fix that with real engineering principles. In this talk, you’ll learn how to build agents that are modular, observable, and ready for production. If you’re tired of LLM demos that don’t deliver, this talk is your blueprint for building agents that actually work.

Natural Language Processing & Audio (incl. Generative AI NLP)
B07-B08
12:00
12:00
30min
Building Reactive Data Apps with Shinylive and WebAssembly
Christoph Scheuch

WebAssembly is reshaping how Python applications can be delivered - allowing fully interactive apps that run directly in the browser, without a traditional backend server. In this talk, I’ll demonstrate how to build reactive, data-driven web apps using Shinylive for Python, combining efficient local storage with Parquet and extending functionality with optional FastAPI cloud services. We’ll explore the benefits and limitations of this architecture, share practical design patterns, and discuss when browser-based Python is the right choice. Attendees will leave with hands-on techniques for creating modern, lightweight, and highly responsive Python data applications.

PyData & Scientific Libraries Stack
B05-B06
12:00
30min
One API to Rule Them All? LiteLLM in Production
Alina Dallmann

Using LiteLLM in a Real-World RAG System: What Worked and What Didn’t

LiteLLM provides a unified interface to work with multiple LLM providers—but how well does it hold up in practice? In this talk, I’ll share how we used LiteLLM in a production system to simplify model access and handle token budgets. I’ll outline the benefits, the hidden trade-offs, and the situations where the abstraction helped—or got in the way. This is a practical, developer-focused session on integrating LiteLLM into real workflows, including lessons learned around deployment, limitations, and decision points. If you’re considering LiteLLM, this talk offers a grounded look at using it beyond simple prototypes.

Generative AI
B07-B08
12:30
12:30
60min
Lunch Break
Kuppelsaal
12:30
60min
Lunch Break
B09
12:30
60min
Lunch Break
B07-B08
12:30
60min
Lunch Break
B05-B06
13:40
13:40
30min
Data science in containers: the good, the bad, and the ugly
Jérôme Petazzoni

If we want to run data science workloads (e.g. using Tensorflow, PyTorch, and others) in containers (for local development or production on Kubernetes), we need to build container images. Doing that with a Dockerfile is fairly straightforward, but is it the best method?
In this talk, we'll take a well-known speech-to-text model (Whisper) and show various ways to run it in containers, comparing the outcomes in terms of image size and build time.

Infrastructure - Hardware & Cloud
B05-B06
13:40
90min
Deep Dive into the Synthetic Data SDK
Tobias Hann

In January the Synthetic Data SDK was introduced and it quickly is gaining traction as becoming the standard Open Source library for creating privacy-preserving synthetic data. In this hands-on tutorial we're going beyond the basics and we'll look at many of the advanced features of the SDK including differential privacy, conditional generation, multi-tables, and fair synthetic data.

Data Handling & Engineering
B09
13:40
30min
Scaling Probabilistic Models with Variational Inference
Dr. Juan Orduz

This talk presents variational inference as a tool to scale probabilistic models. We describe practical examples with NumPyro and PyMC to demonstrate this method, going through the main concepts and diagnostics. Instead of going heavy into the math, we focus on the code and practical tips to make this work in real industry applications.

PyData & Scientific Libraries Stack
B07-B08
14:20
14:20
30min
Beyond Benchmarks: Practical Evaluation Strategies for Compound AI Systems
Iryna Kondrashchenko, Oleh Kostromin

Evaluating large language models (LLMs) in real-world applications goes far beyond standard benchmarks. When LLMs are embedded in complex pipelines, choosing the right models, prompts, and parameters becomes an ongoing challenge.

In this talk, we will present a practical, human-in-the-loop evaluation framework that enables systematic improvement of LLM-powered systems based on expert feedback. By combining domain expert insights and automated evaluation methods, it is possible to iteratively refine these systems while building transparency and trust.

This talk will be valuable for anyone who wants to ensure their LLM applications can handle real-world complexity - not just perform well on generic benchmarks.

Natural Language Processing & Audio (incl. Generative AI NLP)
B05-B06
14:20
30min
How We Automate Chaos: Agentic AI and Community Ops at PyCon DE & PyData
Alexander CS Hendorf

Using AI agents and automation, PyCon DE & PyData volunteers have transformed chaos into streamlined conference ops. From YAML files to LLM-powered assistants, they automate speaker logistics, FAQs, video processing, and more while keeping humans focused on creativity. This case study reveals practical lessons on making AI work in real-world scenarios: structured workflows, validation, and clear context beat hype. Live demos and open-source tools included.

Data Handling & Engineering
B07-B08
15:00
15:00
30min
Navigating healthcare scientific knowledge:building AI agents for accurate biomedical data retrieval
Laura Dumont

With a focus on healthcare applications where accuracy is non negotiable, this talk highlights challenges and delivers practical insights on building AI agents which query complex biological and scientific data to answer sophisticated questions. Drawing from our experience developing Owkin-K Navigator, a free-to-use AI co-pilot for biological research, I'll share hard-won lessons about combining natural language processing with SQL querying and vector database retrieval to navigate large biomedical knowledge sources, addressing challenges of preventing hallucinations and ensuring proper source attribution.
This session is ideal for data scientists, ML engineers, and anyone interested in applying python and LLM ecosystem to the healthcare domain.

Generative AI
B05-B06
15:30
15:30
30min
Coffee Break
Kuppelsaal
15:30
30min
Coffee Break
B09
15:30
30min
Coffee Break
B07-B08
15:30
30min
Coffee Break
B05-B06
16:00
16:00
45min
Forget the Cloud: Building Lean Batch Pipelines from TCP Streams with Python and DuckDB
Orell Garten

Many industrial and legacy systems still push critical data over TCP streams. Instead of reaching for heavyweight cloud platforms, you can build fast, lean batch pipelines on-prem using Python and DuckDB.

In this talk, you'll learn how to turn raw TCP streams into structured data sets, ready for analysis, all running on-premise. We'll cover key patterns for batch processing, practical architecture examples, and real-world lessons from industrial projects.

If you work with sensor data, logs, or telemetry, and you value simplicity, speed, and control this talk is for you.

Data Handling & Engineering
B09
16:00
45min
From Manual to LLMs: Scaling Product Categorization
Giampaolo Casolla, Ansgar Grüne

How to use LLMs to categorize hundreds of thousands of products into 1,000 categories at scale? Learn about our journey from manual/rule-based methods, via fine-tuned semantic models, to a robust multi-step process which uses embeddings and LLMs via the OpenAI APIs. This talk offers data scientists and AI practitioners learnings and best practices for putting such a complex LLM-based system into production. This includes prompt development, balancing cost vs. accuracy via model selection, testing mult-case vs. single-case prompts, and saving costs by using the OpenAI Batch API and a smart early-stopping approach. We also describe our automation and monitoring in a PySpark environment.

Generative AI
B05-B06
16:00
45min
Template-based web app and deployment pipeline at an enterprise-ready level on Azure
Johannes Schöck

A practical deep-dive into Azure DevOps pipelines, the Azure CLI, and how to combine pipeline, bicep, and python templates to build a fully automated web app deployment system. Deploying a new proof of concept app within an actual enterprise environment never was faster.

Infrastructure - Hardware & Cloud
B07-B08
17:00
17:00
45min
Lightning Talks

Lightning Talks are short, 5-minute presentations open to all attendees. They’re a fun and fast-paced way to share ideas, showcase projects, spark discussions, or raise awareness about topics you care about — whether technical, community-related, or just inspiring.

No slides are required, and talks can be spontaneous or prepared. It’s a great chance to speak up and connect with the community!

Lightning Talks
Kuppelsaal
18:00
18:00
60min
PyLadies & Empowered in Tech Social Event @Hofbräu Wirtshaus

Social event organized by PyLadies & Empowered in Tech

Location: Hofbräu Wirtshaus, Karl-Liebknecht-Str. 30, 10178 Berlin

We’ll meet outside the BCC at 18

Community & Diversity
Kuppelsaal
08:30
08:30
30min
Registration & Coffee
Kuppelsaal
08:30
30min
Registration & Coffee
B09
08:30
30min
Registration & Coffee
B07-B08
08:30
30min
Registration & Coffee
B05-B06
09:00
09:00
10min
Opening notes
Kuppelsaal
09:10
09:10
60min
Maintainers of the Future: Code, Culture, and Everything After
Jessica Greene

How we sustain what we build — and why the future of tech depends on care, not only code.

The last five years have reshaped tech — through a pandemic, economic uncertainty, shifting politics, and the rapid rise of AI. While these changes have opened new opportunities, they’ve also exposed the limits — and harms — of a “move fast and break things” mindset.

Education, Career & Life
Kuppelsaal
10:10
10:10
30min
Coffee Break
Kuppelsaal
10:10
30min
Coffee Break
B09
10:10
30min
Coffee Break
B07-B08
10:10
30min
Coffee Break
B05-B06
10:40
10:40
90min
Building an AI Agent for Natural Language to SQL Query Execution on Live Databases
Cainã Max Couto da Silva

This hands-on tutorial will guide participants through building an end-to-end AI agent that translates natural language questions into SQL queries, validates and executes them on live databases, and returns accurate responses. Participants will build a system that intelligently routes between a specialized SQL agent and a ReAct chat agent, implementing RAG for query similarity matching, comprehensive safety validation, and human-in-the-loop confirmation. By the end of this session, attendees will have created a powerful and extensible system they can adapt to their own data sources.

Generative AI
B09
10:40
30min
Bye-Bye Query Spaghetti: Write Queries You'll Actually Understand Using Pipelined SQL Syntax
Tobias Lampert

Are your SQL queries becoming tangled webs that are difficult to decipher, debug, and maintain? This talk explores how to write shorter, more debuggable, and extensible SQL code using Pipelined SQL, an alternative syntax where queries are written as a series of orthogonal, understandable steps. We'll survey which databases and query engines currently support pipelined SQL natively or through extensions, and how it can be used on any platform by compiling pipelined SQL to any SQL dialect using open-source tools. A series of real-world examples, comparing traditional and pipelined SQL syntax side by side for a variety of use cases, will show you how to simplify existing code and make complex data transformations intuitive and manageable.

Data Handling & Engineering
B07-B08
10:40
30min
Edge of Intelligence: The State of AI in Browsers
Johannes Kolbe

API calls suck! Okay, not all of them. But building your AI features reliant on third party APIs can bring a lot of trouble. In this talk you'll learn how to use web technologies to become more independent.

Infrastructure - Hardware & Cloud
B05-B06
11:20
11:20
30min
Docling: Get your documents ready for gen AI
Michele Dolfi, Christoph Auer

Docling, an open source package, is rapidly becoming the de facto standard for document parsing and export in the Python community. Earning close to 30,000 GitHub in less than one year and now part of the Linux AI & Data Foundation. Docling is redefining document AI with its ease and speed of use. In this session, we’ll introduce Docling and its features, including usages with various generative AI frameworks and protocols (e.g. MCP).

Data Handling & Engineering
B07-B08
11:20
30min
How Digital David Wins Against Data Goliaths
Pawel Herman

This talk introduces a new and innovative business model supported by a network of digital activists that form a collective force for protecting humanity, enabling digitally aware users to reclaim control over their data.

Education, Career & Life
B05-B06
12:00
12:00
30min
Better docs, happier users: What we learned applying Diataxis to HoloViz libraries
Maxime Liquet

Clear documentation is crucial for the success of open-source libraries, but it’s often hard to get right. In this talk, I’ll share our experience applying the Diataxis documentation framework to improve two HoloViz ecosystem libraries, hvPlot and Panel. Attendees will come away with practical insights on applying Diataxis and strengthening documentation for their own projects.

Community & Diversity
B07-B08
12:00
30min
Flying Beyond Keywords: Our Aviation Semantic Search Journey
Dat Tran, Dennis Schmidt

In aviation, search isn’t simple—people use abbreviations, slang, and technical terms that make exact matching tricky. We started with just Postgres, aiming for something that worked. Over time, we upgraded: semantic embeddings, reranking. We tackled filter complexity, slow index builds, and embedding updates and much more. Along the way, we learned a lot about making AI search fast, accurate, and actually usable for our users. It’s been a journey—full of turbulence, but worth the landing.

Infrastructure - Hardware & Cloud
B05-B06
12:30
12:30
60min
Lunch Break
Kuppelsaal
12:30
60min
Lunch Break
B09
12:30
60min
Lunch Break
B07-B08
12:30
60min
Lunch Break
B05-B06
13:40
13:40
90min
See only what you are allowed to see: Fine-Grained Authorization
Maria Knorps

Managing who can see or do what with your data is a fundamental challenge, especially as applications and data grow in complexity. Traditional role-based systems often lack the granularity needed for modern data platforms.
Fine-Grained Authorization (FGA) addresses this by controlling access at the individual resource level. In this 90-minute hands-on tutorial, we will explore implementing FGA using OpenFGA, an open-source authorization engine inspired by Google's Zanzibar. Attendees will learn the core concepts of Relationship-Based Access Control (ReBAC) and get practical experience defining authorization models, writing relationship tuples, and performing authorization checks using the OpenFGA Python SDK. Bring your laptop ready to code to learn how to build secure and flexible permission systems for your data applications.

Data Handling & Engineering
B09
13:40
30min
Spot the difference: 🕵️ using foundation models to monitor for change with satellite imagery 🛰️
Ferdinand Schenck

Energy infrastructure is vulnerable to damage by erosion or third party interference, which often takes the form of unsanctioned construction. In this talk we discuss our experiences using deep learning algorithms powered by large foundation models to monitor for changes in bi-temporal very-high resolution satellite imagery.

Computer Vision (incl. Generative AI CV)
B07-B08
13:40
30min
When Postgres is enough: solving document storage, pub/sub and distributed queues without more tools
Eugen Geist

When a new requirement appears, whether it's document storage, pub/sub messaging, distributed queues, or even full-text search, Postgres can often handle it without introducing more infrastructure.

This talk explores how to leverage Postgres' native features like JSONB, LISTEN/NOTIFY, queueing patterns and vector extensions to build robust, scalable systems without increasing infrastructure complexity.

You'll learn practical patterns that extend Postgres just far enough, keeping systems simpler, more maintainable, and easier to operate, especially in small to medium projects or freelancing setups, where Postgres often already forms a critical part of the stack.

Postgres might not replace everything forever - but it can often get you much further than you think.

Data Handling & Engineering
B05-B06
14:20
14:20
30min
Lane detection in self-driving using only NumPy
Emma Saroyan

Are you a scientist or a developer looking to understand how to use NumPy to solve computer vision problems?
NumPy is a Python package that provides the multidimensional array object which you can use to solve the lane detection problem in computer vision for self-driving cars or autonomous driving. You can apply non-machine learning techniques using NumPy to find the straight lines on street images. No other external libraries, just python with NumPy.

Computer Vision (incl. Generative AI CV)
B07-B08
14:20
30min
Scraping urban mobility: analysis of Berlin carsharing
Florian König

Free-floating carsharing systems struggle to balance vehicle supply and demand, which often results in inefficient fleet distribution and reduced vehicle utilization. This talk explores how data scraping can be used to model vehicle demand and user behavior, enabling targeted incentives to encourage self-balancing vehicle flows.

Using information scraped from a major mobility provider over multiple months, the presentation provides spatiotemporal analyses and machine learning results to determine whether it's practically possible to offer low-friction discounts that lead to improved fleet balance.

B05-B06
15:10
15:10
15min
Closing Session

Closing Session

Kuppelsaal