PyData Amsterdam 2025

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
08:30
08:30
30min
Registration
Katherine Johnson @ TNW City
08:30
30min
Registration
Katherine Johnson @ TNW City
08:30
30min
Registration
Margaret Hamilton @ TNW City
08:30
30min
Registration
Margaret Hamilton @ TNW City
09:00
09:00
90min
Meet Docling: The “Pandas” for document AI
Mingxuan Zhao, Panos Vagenas

A workshop session to show you the basics on how to use Docling to enhance document ingestion in your AI workflow.

Margaret Hamilton @ TNW City
09:00
90min
Next-Level Retrieval in RAG: Techniques and Tools for Enhanced Performance
Mahima Arora, Aarti Jha

Retrieval-Augmented Generation (RAG) systems rely heavily on the quality of the retrieval process to generate accurate and contextually relevant outputs. In this 90-minute tutorial, we explore practical techniques to enhance retrieval across three key stages: pre-retrieval, mid-retrieval, and post-retrieval. Participants will learn how to optimize data preparation, query strategies, reranking, and evaluation to significantly improve the performance of RAG systems. A real-world case study will guide attendees through implementing these methods in a complete retrieval workflow.

Katherine Johnson @ TNW City
10:50
10:50
90min
Building AI Agents With Observability Tooling in PyCharm
Yaroslav Sokolov, Lenar Sharipov

As AI-powered agents and workflows grow in complexity, understanding their internal behavior becomes critical. In this hands-on workshop, you’ll build an agent and explore how observability tooling in PyCharm can help you trace, inspect, and debug its behavior at every stage – without having to leave the IDE.

Katherine Johnson @ TNW City
10:50
90min
Understand your data with Knowledge Graphs
Martin O'Hanlon

Graph databases give the same importance to relationships as they do to data. Knowledge graphs allows you to uncover insights in your data and efficiently explore the relationships in your data.

Margaret Hamilton @ TNW City
12:20
12:20
60min
Lunch
Katherine Johnson @ TNW City
12:20
60min
Lunch
Katherine Johnson @ TNW City
12:20
60min
Lunch
Margaret Hamilton @ TNW City
12:20
60min
Lunch
Margaret Hamilton @ TNW City
13:20
13:20
90min
Bridging the Gap: Building Robust, Tool-Integrated LLM Applications with the Model Context Protocol
Adam Hill, Shourya Sharma

Large Language Models (LLMs) are unlocking transformative capabilities — but integrating them into complex, real-world applications remains a major challenge. Simple prompting isn’t enough when dynamic interaction with tools, structured data, and live context is required. This workshop introduces the Model Context Protocol (MCP), an emerging open standard designed to simplify and standardise this integration. Aimed at forward-thinking developers and technologists, this hands-on session will equip participants with practical skills to build intelligent, modular, and extensible LLM-native applications using MCP.

Katherine Johnson @ TNW City
13:20
90min
Grounding LLMs on Solid Knowledge: Assessing and Improving Knowledge Graph Quality in GraphRAG Applications
Panos Alexopoulos

Graph-based Retrieval-Augmented Generation (GraphRAG) enhances large language models (LLMs) by grounding their responses in structured knowledge graphs, offering more accurate, domain-specific, and explainable outputs. However, many of the graphs used in these pipelines are automatically generated or loosely assembled, and often lack the semantic structure, consistency, and clarity required for reliable grounding. The result is misleading retrieval, vague or incomplete answers, and hallucinations that are difficult to trace or fix.

This hands-on tutorial introduces a practical approach to evaluating and improving knowledge graph quality in GraphRAG applications. We’ll explore common failure patterns, walk through real-world examples, and share a reusable checklist of features that make a graph “AI-ready.” Participants will learn methods for identifying gaps, inconsistencies, and modeling issues that prevent knowledge graphs from effectively supporting LLMs, and apply simple fixes to improve grounding and retrieval performance in their own projects.

Margaret Hamilton @ TNW City
15:10
15:10
90min
Event-Driven AI Agent Workflows with Dapr
Dana Arsovska, Marc Duiker

As AI systems evolve, the need for robust infrastructure increases. Enter Dapr Agents: an open-source framework for creating production-grade AI agent systems. Built on top of the Dapr framework, Dapr Agents empowers developers to build intelligent agents capable of collaborating in complex workflows - leveraging Large Language Models (LLMs), durable state, built-in observability, and resilient execution patterns. This workshop will walk through the framework’s core components and through practical examples demonstrate how it solves real-world challenges.

Katherine Johnson @ TNW City
15:10
90min
Listen: A Practical Introduction to Data Sonification
Tomek Roszczynialski

Sonification–using sound to represent data–is a niche technique for exploring complex patterns, expanding the sensory dimensions of data analysis, and discovering musical ideas that are otherwise inaccessible.

In this hands-on session, participants will learn the ins and outs of building sonification pipelines through practical examples with data from healthcare and physics. We’ll also cover key software design considerations for creating flexible and expressive systems that map data into sound. Whether you're a developer, data scientist, researcher, educator, or artist, this session will help you listen to your data.

Margaret Hamilton @ TNW City
08:00
08:00
60min
Registration and Breakfast
Apollo
08:00
60min
Registration and breakfast
Voyager
08:00
60min
Registration and breakfast
Nebula
09:00
09:00
30min
Opening notes

Opening notes

Apollo
09:30
09:30
50min
Evals are your moat
Demetrios Brinkmann

Standard benchmarks are kinda bullsh** and the internet knows it.

Little more than a marketing ploy, leaderboards have made us lose
trust in model release claims. They rarely reflect your unique,
real-world needs, leaving you without a reliable way to measure
success.This talk is about why building and continuously updating your
own evaluation systems is the key to creating a durable competitive
moat.

We’ll explore how to craft a robust “golden dataset,” and review the
tooling ecosystem.I learned a few tricks on how to make the most of
your evals from how to collect them to how to label them and I want to
share it and make sure you get the best golden data set possible.

Apollo
10:20
10:20
15min
Coffee break
Apollo
10:20
15min
Coffee break
Apollo
10:20
15min
Coffee break
Voyager
10:20
15min
Coffee break
Voyager
10:20
15min
Coffee break
Nebula
10:20
15min
Coffee break
Nebula
10:35
10:35
35min
Large-Scale Video Intelligence
Irene Donato, Antonino Ingargiola

The explosion of video data demands search beyond simple metadata. How do we find specific visual moments, actions, or faces within petabytes of footage? This talk dives into architecting a robust, scalable multi-modal video search system.
We will explore an architecture combining efficient batch preprocessing for feature extraction (including person detection, face/CLIP-style embeddings) with optimized vector database indexing. Attendees will learn practical strategies for managing massive datasets, optimizing ML inference (e.g., lightweight models, specialized runtimes), and bridging pre-computed indexes with real-time analysis for deeper insights. This session is for data scientists, ML engineers, and architects looking to build sophisticated video understanding capabilities.

Audience: Data Scientists, Machine Learning Engineers, Data Engineers, System Architects.

Takeaway: Attendees will learn architectural patterns and practical techniques for building scalable multi-modal video search systems, including feature extraction, vector database utilization, and ML pipeline optimization.

Background Knowledge: Familiarity with Python, core machine learning concepts (e.g., embeddings, classification), and general data processing pipelines is beneficial. Experience with video processing or computer vision is a plus but not strictly required.

Voyager
10:35
35min
Should Captain America Still Host Your Data? A Call for Open, EU-Based Data Platforms.
Manuel Spierenburg

When you store data in the cloud, do you know who really controls it? In an era of increasing geopolitical tension and growing awareness around digital sovereignty, Dutch research institutes have already begun repatriating sensitive data from US servers to Dutch-controlled storage. This talk explores the hidden risks behind common cloud choices, from legal access by foreign governments to the ethical implications of supporting politically active tech giants. We’ll look at what it means to own your data, how regional storage might not be enough, and what it takes to build an EU-hosted, open-source data platform stack. If you’re a data engineer, architect, or technology leader who cares about privacy, control, and sustainable infrastructure, this talk will equip you with the insight—and motivation—to make different choices.

Apollo
10:35
35min
Uncertainty Unleashed: Wrapping Your Predictions in Honesty
Konstantinos Tsoumas

There are a lot of models working in production as you're reading this. Lots of them are giving uncalibrated outputs without being explicit on how much one can trust the result. Especially when it comes to imbalanced datasets.

More so, relying on biased estimates can lead to overly aggressive decisions. In this hands‑on talk, we’ll demystify conformal methods using MNIST—the world’s favorite handwritten‑digit playground (to make the talk more fun & interactive)- with two goals in mind: explain & prove what an unbiased guarantee is and how it can be calculated but also why should you care and why does it matter so much. Attendees may leave equipped with: uncertainty guarantee understanding in classification, identify common pitfalls that lead to biased uncertainty estimates, how to apply it (even in difficult contexts like imbalanced datasets - an example will be given).

Nebula
11:20
11:20
30min
Actionable Techniques for Finding Performance Regressions
Thijs Nieuwdorp, Jeroen Janssens

Ever been burned by a mysterious slowdown in your data pipeline? In this session, we'll reveal how a stealthy performance regression in the Polars DataFrame library was hunted down and squashed. Using git bisect, Bash scripting, and uv, we automated commit compilation and benchmarking across two repos to pinpoint a commit that degraded multi-file Parquet loading. This led to challenging assumptions and rethinking performance monitoring for the Python data science library Polars.

Orbit
11:20
35min
Causal Inference Framework for incrementality : A Case Study at Booking to estimate incremental CLV due to App installs
Netesh, Nazlı Alagöz

This talk dives into the challenge of measuring the causal impact of app installs on customer loyalty and value, a question at the heart of data-driven marketing. While randomized controlled trials are the gold standard, they’re rarely feasible in this context. Instead, we’ll explore how observational causal inference methods can be thoughtfully applied to estimate incremental value with careful consideration of confounding, selection, and measurement biases.
This session is designed for data scientists, marketing analysts, and applied researchers with a working knowledge of statistics and causal inference concepts. We’ll keep the tone practical and informative, focusing on real-world challenges and solutions rather than heavy mathematical derivations.

Attendees will learn:
* How to design robust observational studies for business impact
* Strategies for covariate selection and bias mitigation
* The use of multiple statistical and design-based causal inference approaches
* Methods for validating and refuting causal claims in the absence of true randomization
We’ll share actionable insights, code snippets, and a GitHub repository with example workflows so you can apply these techniques in your own organization. By the end of the talk, you’ll be equipped to design more transparent and credible causal studies-and make better decisions about where to invest your marketing dollars.

Requirements:
A basic understanding of causal inference and Python is recommended. Materials and relevant links will be shared during the session

Nebula
11:20
35min
Counting Groceries with Computer Vision: How Picnic Tracks Inventory Automatically
Sven Arends

In this talk, we'll share how we're using computer vision to automate stock counting, right on the conveyor belt. We'll discuss the challenges we've faced with the hardware, software, and GenAI components, and we'll also review our own benchmark results for the various state-of-the-art models. Finally, we'll cover the practical aspects of GenAI deployment, including prompt optimization, preventing LLM "yapping," and creating a robust feedback loop for continuous improvement.

Apollo
11:20
35min
Potato breeding using image analysis in a production setting
Dick Abma, Rik Nuijten

The scale-up company Solynta focuses on hybrid potato breeding, which helps achieve improvements in yield, disease resistance, and climate adaptation. Scientific innovation is part of our core business. Plant selections are highly data-driven, involving, for example, drone observations and genetic data. Minimal time-to-production for new ideas is essential, which is facilitated by our custom AWS devops platform. This platform focusses on automation and accessible data storage.

In this talk, we introduce how computer vision (YOLO and SAM modelling) enables monitoring traits of plants in the field, and how we operate these models. This further entails:
• Our experience from training and evaluating models on drone images
• Trade-offs selecting AWS services, Terraform modules and Python packages for automation and robustness
• Our team setup that allows IT specialists and biologists to work together effectively

The talk will provide practical insights for both data scientists and DevOps engineers. The main takeaways are that object detection and segmentation from drone maps, at scale, are achievable for a small team. Furthermore, with the right approach, you can standardise a DevOps platform to let operations and developers work together.

Voyager
12:05
12:05
35min
Formula 1 goes Bayesian: Time Series Decomposition with PyMC
Wesley Boelrijk

Forecasting time series can be messy, data is often missing, noisy, or full of structural changes like holidays, outliers, or evolving patterns. This talk shows how to build interpretable time series decomposition models using PyMC, a modern probabilistic programming library.

We’ll break time series into trend, seasonality, and noise components using engineered time features (e.g., Fourier and Radial Basis Functions). You’ll also learn how to model correlated series using hierarchical priors, letting multiple time series "learn from each other." As a case study, we’ll analyze Formula 1 lap time data to compare drivers and explore performance consistency using Bayesian posteriors.

This is a hands-on, code-first talk for data scientists, ML engineers, and researchers curious about Bayesian modeling (or Formula 1). Familiarity with Python and basic statistics is helpful, but no deep knowledge of Bayes is required.

Voyager
12:05
35min
From pixel to predictions: A journey through our CT image pipeline in pig breeding using POSIT
Lisette van der Zande

How do you turn a CT scan of a pig into usable data for large-scale genetic research? At Topigs Norsvin, we scan 10,000 male pigs each year using high-resolution CT imaging. This allows us to look inside the animals and assess carcass quality, muscle composition, and indicators of health. We use this data to inform selection decisions and improve the accuracy of our breeding program. In this talk, I'll walk you through the journey of CT data: from scan acquisition and processing to how we extract traits and integrate them into the breeding program. A key part of this process is POSIT, a lightweight project structure that helps us manage complexity, ensure reproducibility and scale our pipelines effectively. While the biological context is specific, the data challenges are familiar to any data professional.

Nebula
12:05
35min
GenAI governance in practice: patterns, pitfalls & strategies across tools and industries
Maarten de Ruiter

Governing generative AI systems presents unique challenges, particularly for teams dealing with diverse GenAI subdomains and rapidly changing technological landscapes. In this talk, Maarten de Ruiter, Data Scientist at Xomnia, shares practical insights drawn from real-world GenAI use-cases. He will highlight essential governance patterns, address common pitfalls, and provide actionable strategies for teams utilizing both open-source tools and commercial solutions. Attendees will gain concrete recommendations that work in practice, informed by successes (and failures!) across multiple industries

Apollo
12:40
12:40
60min
Lunch break
Apollo
12:40
60min
Lunch break
Apollo
12:40
60min
Lunch break
Voyager
12:40
60min
Lunch break
Voyager
12:40
60min
Lunch break
Nebula
12:40
60min
Lunch break
Nebula
13:40
13:40
50min
Context is King: Evaluating Long Context vs. RAG for Data Grounding
Bauke Brenninkmeijer

Grounding Large Language Models in your specific data is crucial, but notoriously challenging. Retrieval-Augmented Generation (RAG) is the common pattern, yet practical implementations are often brittle, suffering from poor retrieval, ineffective chunking, and context limitations, leading to inaccurate or irrelevant answers. The emergence of massive context windows (1M+ tokens) seems to offer a simpler path – just put all your data in the prompt! But does it truly solve the "needle in a haystack" problem, or introduce new challenges like prohibitive costs and information getting lost in the middle? This talk dives deep into the engineering realities. We'll dissect common RAG failure modes, explore techniques for building robust RAG systems (advanced retrieval, re-ranking, query transformations), and critically evaluate the practical viability, costs, and limitations of leveraging long context windows for complex data tasks in Python. Leave understanding the real trade-offs to make informed architectural decisions for building reliable, data-grounded GenAI applications.

Apollo
13:40
50min
Designing tests for ML libraries – lessons from the wild
Sayak Paul, Benjamin Bossan

In this talk, we will cover how to write effective test cases for machine learning (ML) libraries that are used by hundreds of thousands of users on a regular basis. Tests, despite their well-established need for trust and foolproofing, often get less prioritized. Later, this can wreak havoc on massive codebases, with a high likelihood of introducing breaking changes and other unpleasant situations. This talk deals with our approach to testing our ML libraries, which serve a wide user base. We will cover a wide variety of topics, including the mindset and the necessity of minimal-yet-sufficient testing, all the way up to sharing some practical examples of end-to-end test suites.

Voyager
13:40
50min
Streamlining data pipeline development with Ordeq
Niels Neerhoff

In this talk, we will introduce Ordeq, a cutting-edge data pipeline development framework used by data engineers, scientists and analysts across ING. Ordeq helps you modularise pipeline logic and abstract IO, elevating projects from proof-of-concepts to maintainable production-level applications. We will demonstrate how Ordeq integrates seamlessly with popular data processing tools like Spark, Polars, Matplotlib, DSPy, and orchestration tools such as Airflow. Additionally, we showcase how you can leverage Ordeq on public cloud offering like GCP. Ordeq has 0 dependencies and is available under MIT license.

Nebula
14:40
14:40
35min
Continuous monitoring of model drift in the financial sector
Denis Gaitan, Agustin Iniguez

In today’s financial sector, the continuous accuracy and reliability of machine learning models are crucial for operational efficiency and effective risk management. With the rise of MLOps (Machine Learning Operations), automating monitoring mechanisms has become essential to ensure model performance and compliance with regulations. This presentation introduces a method for continuous monitoring of model drift, highlighting the benefits of automation within the MLOps framework. This topic is particularly interesting because it addresses a common challenge in maintaining model performance over time and demonstrates a practical solution that has been successfully implemented in the bank.

This talk is aimed at data scientists, machine learning engineers, and MLOps practitioners who are interested in automating the monitoring of machine learning models. Attendees will be guided on how to continuous monitor model drift within the MLOps framework. They will understand the benefits of automation in this context, and gain insights into MLOps best practices. A basic understanding of MLOps principles, and statistical techniques for model evaluation will be helpful but not strictly needed.

The presentation will be an informative talk with a focus on the design and implementation. It will include some mathematical concepts but will primarily be demonstrating real-world applications and best practices. At the end we encourage you to actively monitor model drift and automate your monitoring processes to enhance model accuracy, scalability, and compliance in your organizations.

Nebula
14:40
35min
Gotta catch ‘em all - Hunting Fraudsters with Minimal Labels and Maximum ML
Jaap Stefels, Itzel Belderbos

Card testing is one of the largest growing fraud problems within the payments landscape, with fraudsters launching millions of attempts globally each month. These attacks can cost companies thousands of euros in lost revenue and lead to the distribution of private card details. Detecting this type of fraud is extremely difficult without confirmed labels to train standard supervised ML classifiers. In this talk, we’ll describe how we built a production-ready ML model that now processes hundreds of transactions per second and share the key take-aways from our journey.

Apollo
14:40
30min
Microlog: Explain Your Python Applications with Logs, Graphs, and AI
Chris Laffra

Microlog is a lightweight continuous profiler and logger for Python that helps developers understand their applications through interactive visualizations and AI-powered insights. With extremely low overhead and a 100% Python stack, it makes it easy to trace performance issues, debug unexpected behavior, and gain visibility into production systems.

Orbit
14:40
35min
What Works: Practical Lessons in Applying Privacy-Enhancing Technologies (PET) in Data Science
Yuliya Sapega, Joanna Pasiarska

Privacy-Enhancing Technologies (PETs) promise to bridge the gap between data utility and privacy — but how do they perform in practice? In this talk, we’ll share real-world insights from our hands-on experience testing and implementing leading PET solutions across various data science use cases.
We explored tools such as differential privacy libraries, homomorphic encryption frameworks, federated learning, multi-party computation, etc. Some lived up to their promise — others revealed critical limitations.
You’ll walk away with a clear understanding of which PET solutions work best for which types of data and analysis, what trade-offs to expect, and how to set realistic goals when integrating PETs into your workflows. This session is ideal for data professionals and decision-makers who are navigating privacy risks while still wanting to innovate responsibly.

Voyager
15:20
15:20
35min
Measure twice, deploy once: Evaluation of retrieval systems
Paul verhaar, Maarten koopmans

Improving retrieval systems—especially in RAG pipelines—requires a clear understanding of what’s working and what isn’t. The only scalable way to do that is through meaningful metrics. In this talk, we share insights from building a platform-agnostic search and retrieval product, and how we balance performance against cost. Bigger models often give better results… but at what price? We explain how to assess what’s “good enough” and why the choice of benchmark really matters.

Voyager
15:20
35min
Quiet on Set: Building an On-Air Sign with Open Source Technologies
Danica Fine

Using a Raspberry Pi and a powerful trio of open-source technologies—Apache Kafka, Apache Flink, and Apache Iceberg—learn how to build a custom on-air sign to signal when you're on a call and discover how this same scaffolding can be scaled for millions of users.

Nebula
15:20
35min
The Gentle Monorepo: Ship Faster and Collaborate Better
Gerben Dekker

Monorepos promise faster development and smoother cross-team collaboration, but they often seem intimidating, requiring major tooling, buy-in, and process changes. This talk shows how Dexter gradually introduced a Python monorepo by combining a few lightweight tools with a pragmatic, trust-based approach to adoption. The result is that we can effectively reuse components across our various energy forecasting and trade optimization products. We iterate quicker on bringing our research to production, which benefits our customers and supports the renewable energy transition. After this talk, you’ll walk away with a practical blueprint for introducing a monorepo in your context, without requiring heavy up-front work.

Apollo
16:00
16:00
20min
Snack break
Apollo
16:00
20min
Snack break
Apollo
16:00
20min
Snack break
Voyager
16:00
20min
Snack break
Voyager
16:00
20min
Snack break
Nebula
16:00
20min
Snack break
Nebula
16:20
16:20
50min
Ethics is Not a Feature: Rethinking AI from the Ground Up

Ethics is often treated like a product feature—something to be added at the end, polished for compliance, or marketed for trust. But what if that mindset is exactly what’s holding us back?
In this keynote, we’ll challenge the idea that ethics is optional or external to the development process. We’ll explore how ethical blind spots in AI systems—from biased models to black-box decisions to unsustainable compute—aren’t just philosophical dilemmas, but human failures with real-world consequences.
You’ll learn how to spot ethical risks before they become failures, and discover practical tools and mindsets to build AI that earns trust—without compromising on innovation. From responsible data practices to transparency techniques and green AI strategies, we’ll connect the dots between values and code.
This isn’t just a lecture—it’s a call to rethink how we build the future of AI—together.

Apollo
17:10
17:10
10min
Closing notes

Closing notes

Apollo
17:20
17:20
55min
Social Event
Apollo
17:20
55min
Social Event
Apollo
17:20
55min
Social Event
Voyager
17:20
55min
Social Event
Voyager
17:20
55min
Social Event
Nebula
17:20
55min
Social Event
Nebula
08:30
08:30
30min
Registration and breakfast
Apollo
08:30
30min
Registration and breakfast
Apollo
08:30
30min
Registration and breakfast
Voyager
08:30
30min
Registration and breakfast
Voyager
08:30
30min
Registration and breakfast
Nebula
08:30
30min
Registration and breakfast
Nebula
09:00
09:00
50min
Image processing, artificial intelligence, and autonomous systems
Judith Dijk

In this talk, an overview of the field of image processing and the impact of artificial intelligence on this field are shown. Starting from the different tasks that can be performed with image processing, solutions using different AI technologies are shown, including the use of generative AI. Finally, the effect of AI for autonomous systems, and the challenges that are faced are discussed.

Apollo
09:50
09:50
15min
Coffee break
Apollo
09:50
15min
Coffee break
Apollo
09:50
15min
Coffee Break
Voyager
09:50
15min
Coffee Break
Voyager
09:50
15min
Coffee break
Nebula
09:50
15min
Coffee break
Nebula
10:05
10:05
50min
Model Context Protocol: Principles and Practice
Fabio Lipreri, Gabriele Orlandi

Large‑language‑model agents are only as useful as the context and tools they can reach.

Anthropic’s Model Context Protocol (MCP) proposes a universal, bidirectional interface that turns every external system—SQL databases, Slack, Git, web browsers, even your local file‑system—into first‑class “context providers.”

In just 30 minutes we’ll step from high‑level buzzwords to hands‑on engineering details:

  • How MCP’s JSON‑RPC message format, streaming channels, and version‑negotiation work under the hood.
  • Why per‑tool sandboxing via isolated client processes hardens security (and what happens when an LLM tries rm ‑rf /).
  • Techniques for hierarchical context retrieval that stretch a model’s effective window beyond token limits.
  • Real‑world patterns for accessing multiple tools—Postgres, Slack, GitHub—and plugging MCP into GenAI applications.

Expect code snippets and lessons from early adoption.

You’ll leave ready to wire your own services into any MCP‑aware model and level‑up your GenAI applications—without the N×M integration nightmare.

Voyager
10:05
50min
Optimize the Right Thing: Cost-Sensitive Classification in Practice
Shimanto Rahman

Not all mistakes in machine learning are equal—a false negative in fraud detection or medical diagnosis can be far costlier than a false positive. Cost-sensitive learning helps navigate these trade-offs by incorporating error costs into the training process, leading to smarter decision-making. This talk introduces Empulse, an open-source Python package that brings cost-sensitive learning into scikit-learn. Attendees will learn why standard models fall short in cost-sensitive scenarios and how to build better classifiers with Scikit-Learn and Empulse.

Nebula
10:05
50min
Untitled13.ipynb
Vincent Warmerdam

For well over a decade, Python notebooks revolutionized our field. They gave us so much creative freedom and dramatically lowered the entry barrier for newcomers. Yet despite all this ... it has been a decade! And the notebook is still in roughly the same form factor.

So what if we allow ourselves to rethink notebooks ... really rethink it! What features might we come up with? Can we make the notebook understand datasources? What about LLMs? Can we generate widgets on the fly? What if we make changes to Python itself?

This presentation will be a stream of demos that help paint a picture of what the future might hold. I will share my latest work in the anywidget/marimo ecosystem as well as some new hardware integrations.

The main theme that I will work towards: if you want better notebooks, reactive Python might very well be the future.

Apollo
11:05
11:05
35min
Declarative Feature Engineering: Bridging Spark and Flink with a Unified DSL
Miguel Leite, Vitalii Zhebrakovskyi

Building ML features at scale shouldn’t require every ML Scientist to become an expert in Spark or Flink. At Adyen, the Feature Platform team built a Python-based DSL that lets data scientists define features declaratively — while automatically generating the necessary batch or real-time pipelines behind the scenes.

Nebula
11:05
35min
Detection of Unattended Objects in Public Spaces using AI
Evertjan Peer

This talk presents an end-to-end solution for detecting unattended objects in public transport hubs to enhance social security. The project, developed in a three-week challenge, focuses on proactively identifying unattended items using existing camera infrastructure. We will cover the entire pipeline, from data anonymization and preprocessing to building a data labeling platform, object detection with YOLO, and tracking objects over time. The presentation will also discuss the evaluation of the system.

Voyager
11:05
35min
Scaling Trust: A practical guide on evaluating LLMs and Agents
George Chouliaras, Antonio Castelli

Recently, the integration of Generative AI (GenAI) technologies into both our personal and professional lives has surged. In most organizations, the deployment of GenAI applications is on the rise, and this trend is expected to continue in the foreseeable future. Evaluating GenAI systems presents unique challenges not present in traditional ML. The main peculiarity is the absence of ground truth for textual metrics such as: text clarity, location extraction accuracy, factual accuracy and so on. Nevertheless the non-negligible model serving cost demands an even more thorough evaluation of the system to be deployed in production.

Defining the metric ground truth is a costly and time consuming process requiring human annotation. To address this, we are going to present how to evaluate LLM-based applications by leveraging LLMs themselves as evaluators. Moreover we are going to outline the complexities and evaluation methods for LLM-based Agents which operate with autonomy and present further evaluation challenges. Lastly, we will explore the critical role of evaluation in the GenAI lifecycle and outline the steps taken to integrate these processes seamlessly.

Whether you are an AI practitioner, user or enthusiast, join us to gain insights into the future of GenAI evaluation and its impact on enhancing application performance.

Apollo
11:50
11:50
30min
Composable Pipelines for ML: Automating Feature Engineering with Hopsworks’ Brewer
Javier de la Rúa Martínez

Operationalizing ML isn’t just about models — it’s about moving and engineering data. At Hopsworks, we built a composable AI pipeline builder (Brewer) based on two principles: Tasks and Data Sources. This lets users define workflows that automatically analyse, clean, create and update feature groups, without glue code or brittle scheduling logic.

In this talk, we’ll show how Brewer drives the automation of feature engineering, enabling reproducible, declarative pipelines that respond to changes in upstream data. We’ll explore how this fits into broader ML workflows, from ingestion to feature materialization, and how it integrates with warehouses, streams, and file-based systems.

Orbit
11:50
35min
How to Keep Your LLM Chatbots Real: A Metrics Survival Guide
Maria Bader

In this brave new world of vibe coding and YOLO-to-prod mentality, let’s take a step back and keep things grounded (pun intended). None of us would ever deploy a classical ML model to production without clearly defined metrics and proper evaluation, so let's talk about methodologies for measuring performance of LLM-powered chatbots. Think of retriever recall, answer relevancy, correctness, faithfulness and hallucination rates. With the wild west of metric standards still in full swing, I’ll guide you through the challenges of curating a synthetic test set, and selecting suitable metrics and open-source packages that help evaluating your use case. Everything is possible, from simple LLM-as-a-judge approaches like those inherent to many packages like MLFLow now up to complex multi-step quantification approaches with Ragas. If you work in the GenAI space or with LLM-powered chatbots, this session is for you! Prior or background knowledge is of advantage, but not required.

Apollo
11:50
35min
Kafka Internals I Wish I Knew Sooner: The Non-Boring Truths
Dima Baranetskyi

Most of us start with Kafka by building a simple producer/consumer demo. It just works — until it doesn’t. Suddenly, disk space isn’t freed up after data “expires,” rebalances loop endlessly during deploys, and strange errors about missing leaders clog your logs.
In the panic, we dive into Kafka’s ocean of config options — hoping something will stick. Sound familiar?

This talk is a collection of hard-won lessons — not flashy tricks, but the kind of insights you only gain after operating Kafka in production for years. You’ll walk away with mental models that make Kafka’s internal behavior more predictable and less surprising.

We’ll cover:
- Storage internals: Why expired data doesn’t always free space — and how Kafka actually reclaims disk
- Transactions & delivery semantics: What “exactly-once” really means, and when it silently downgrades
- Consumer group rebalancing: Why rebalances loop, and how the controller’s hidden behavior affects them

If you’ve used Kafka — or plan to — these insights will save you hours of frustration and debugging.
A basic understanding of partitions, replication, and Kafka’s general architecture will help get the most out of this session.

Nebula
11:50
35min
Optimal Observability: Partitioning Data into Time-Series for Enhanced Anomaly Detection and Improved Monitoring Coverage
Vitalie Spinu

This talk presents a principled methodology for partitioning item-level data into homogeneous time-series, with the objective of maximizing monitoring coverage and improving the detection of anomalies and drifts. We discuss the theoretical underpinnings of clustering algorithms for this task and describe practical algorithms enabling efficient search for optimal partitioning. We exemplify our approach with a real-world application in large-scale monitoring environments from the online payment domain.

Voyager
12:25
12:25
60min
Lunch
Apollo
12:25
60min
Lunch
Apollo
12:25
60min
Lunch
Voyager
12:25
60min
Lunch
Voyager
12:25
60min
Lunch
Nebula
12:25
60min
Lunch
Nebula
13:25
13:25
35min
Kickstart Your Probabilistic Forecasting with Level Set and Quantile Regression Forests
Inge van den Ende

Probabilistic forecasting is essential, but choosing the right method is tricky. This talk introduces two lesser-known models — Level Set Forecaster and Quantile Regression Forest — that help you kickstart probabilistic forecasting without unnecessary complexity.

Apollo
13:25
35min
Searching for My Next Chart
Muhammad Chenariyan Nakhaee

Abstract

As a data visualization practitioner, I frequently draw inspiration from the diverse and rapidly expanding community, particularly through challenges like #TidyTuesday. However, the sheer volume of remarkable visualizations quickly overwhelmed my manual curation methods—from Pinterest boards to Notion pages. This created a significant bottleneck in my workflow, as I found myself spending more time cataloging charts than actively creating them.

In this talk, I will present a RAG (Retrieval Augmented Generation) based retrieval system that I designed specifically for data visualizations. I will detail the methodology behind this system, illustrating how I addressed my own workflow inefficiencies by transforming a dispersed collection of charts into a semantically searchable knowledge base. This project serves as a practical example of applying advanced AI techniques to enhance creative technical work, demonstrating how a specialized retrieval system can significantly improve the efficiency and quality of data visualization creation process.

Voyager
13:25
35min
🍯 Sweet Language Model Python Applications
Jamie Coombes

It's easy to get into sticky situations when building language model powered applications. This talk helps you transform sticky messes into sweet successes.

This talk delivers practical insights from creating production-ready language model solutions with Python. Moving beyond theoretical possibilities, we'll explore the real architectural decisions, code patterns, and tradeoffs that determine success or failure. Through a progressive case study, we'll demonstrate how to effectively leverage Pydantic structured outputs, validation techniques, and thoughtful API integrations while avoiding common pitfalls. Perfect for Pythonic product teams looking to build applications that result in a sweet treat everyone will love.

Nebula
14:10
14:10
35min
Help! There Are Humans in My Data!
Marysia Winkels, Isabelle Donatz-Fest

Good quality data is the basis for high quality models and valuable data insights. But isn't it annoying how often your data is riddled with those pesky humans? Human involvement in data creation often introduces errors, misunderstandings, and biases that can compromise data integrity. This talk will explore how human factors influence the data creation process and what we as data professionals can do to account for this in our data interpretation and usage.

Apollo
14:10
35min
Is Prompt Engineering Dead? How Auto-Optimization is Changing the Game
Iryna Kondrashchenko, Oleh Kostromin

The rise of LLMs has elevated prompt engineering as a critical skill in the AI industry, but manual prompt tuning is often inefficient and model-specific. This talk explores various automatic prompt optimization approaches, ranging from simple ones like bootstrapped few-shot to more complex techniques such as MIPRO and TextGrad, and showcases their practical applications through frameworks like DSPy and AdalFlow. By exploring the benefits, challenges, and trade-offs of these approaches, the attendees will be able to answer the question: is prompt engineering dead, or has it just evolved?

Voyager
14:10
35min
Orchestrating success: How Vinted standardizes large-scale, decentralized data pipelines
Rodrigo Loredo, Oscar Ligthart

At Vinted, Europe’s largest second-hand marketplace, over 20 decentralized data teams generate, transform, and build products on petabytes of data. Each team utilizes their own tools, workflows, and expertise. Coordinating data pipeline creation across such diverse teams presents significant challenges. These include complex inter-team dependencies, inconsistent scheduling solutions, and rapidly evolving requirements.

This talk is aimed at data engineers, platform engineers, and technical leads with experience in workflow orchestration and will demonstrate how we empower teams at Vinted to define data pipelines quickly and reliably. We will present our user-friendly abstraction layer built on top of Apache Airflow, enhanced by a Python code generator. This abstraction simplifies upgrades and migrations, removes scheduler complexity, and supports Vinted’s rapid growth. Attendees will learn how Python abstractions and code generation can standardize pipeline development across diverse teams, reduce operational complexity, and enable greater flexibility and control in large-scale data organizations. Through practical lessons and real-world examples of our abstraction interface, we will offer insights into designing scheduler-agnostic architectures for successful data pipeline orchestration.

Nebula
14:55
14:55
35min
Real-Time Context Engineering for LLMs
Manu Joseph

Context engineering has replaced prompt engineering as the main challenge in building agents and LLM applications. Context engineering involves providing LLMs with relevant and timely context data from various data sources, which allows them to make context-aware decisions. The context data provided to the LLM must be produced in real-time to enable it to react intelligently at human perceivable latencies (a second or two at most). If the application takes longer to react, humans would perceive it as laggy and unintelligent.
In this talk, we will introduce context engineering and motivate for real-time context engineering for interactive applications. We will also demonstrate how to integrate real-time context data from applications inside Python agents using the Hopsworks feature store and corresponding application IDs. Application IDs are the key to unlock application context data for agents and LLMs. We will walk through an example of an interactive application (TikTok clone) that we make AI-enabled with Hopsworks.

Apollo
14:55
35min
Resource Monitoring and Optimization with Metaflow
Gergely Daroczi

Metaflow is a powerful workflow management framework for data science, but optimizing its cloud resource usage still involves guesswork. We have extended Metaflow with a lightweight resource tracking tool that automatically monitors CPU, memory, GPU, and more, then recommends the most cost-effective cloud instance type for future runs. A single line of code can save you from overprovisioned costs or painful job failures!

Nebula
14:55
35min
Sieves: Plug-and-Play NLP Pipelines With Zero-Shot Models
Raphael Mitsch

Generative models are dominating the spotlight lately - and rightly so. Their flexibility and zero-shot capabilities make it incredibly fast to prototype NLP applications. However, one-shotting complex NLP problems often isn't the best long-term strategy. Decomposing problems into modular, pipelined tasks leads to better debuggability, greater interpretability, and more reliable performance.

This modular pipeline approach pairs naturally with zero- and few-shot (ZFS) models, enabling rapid yet robust prototyping without requiring large datasets or fine-tuning. Crucially, many real-world applications need structured data outputs—not free-form text. Generative models often struggle to consistently produce structured results, which is why enforcing structured outputs is now a core feature across contemporary NLP tools (like Outlines, DSPy, LangChain, Ollama, vLLM, and others).

For engineers building NLP pipelines today, the landscape is fragmented. There’s no single standard for structured generation yet, and switching between tools can be costly and frustrating. The NLP tooling landscape lacks a flexible, model-agnostic solution that minimizes setup overhead, supports structured outputs, and accelerates iteration.

Introducing Sieves: a modular toolkit for building robust NLP document processing pipelines using ZFS models.

Voyager
15:30
15:30
40min
Snack break
Voyager
15:30
40min
Snack break
Voyager
15:30
40min
Snack break
Nebula
15:30
40min
Snack break
Nebula
15:35
15:35
35min
Lightning Talks

Lightning Talks

Apollo
16:10
16:10
50min
Minus Three Tier: Data Architecture Turned Upside Down
Hannes Mühleisen

Every data architecture diagram out there makes it abundantly clear who's in charge: At the bottom sits the analyst, above that is an API server, and on the very top sits the mighty data warehouse. This pattern is so ingrained we never ever question its necessity, despite its various issues like slow data response time, multi-level scaling issues, and massive cost.

But there is another way: Disconnect of storage and compute enables localization of query processing closer to people, leading to much snappier responses, natural scaling with client-side query processing, and much reduced cost.

In this talk, it will be discussed how modern data engineering paradigms like decomposition of storage, single-node query processing, and lakehouse formats enable a radical departure from the tired three-tier architecture. By inverting the architecture we can put user's needs first. We can rely on commoditised components like object store to enable fast, scalable, and cost-effective solutions.

Apollo
17:00
17:00
20min
Conference closing notes

Conference closing notes

Apollo
17:20
17:20
40min
Techie vs Comic: The sequel
Arda Kaygan

A data scientist by day and a standup comedian by night. This was how Arda described himself prior to his critically acclaimed performance about his two identities during PyData 2024, where they merged.

Now he doesn't even know.

After another year of stage performances, awkward LinkedIn interactions and mysterious cloud errors, Arda is back for another tale of absurdity. In this closing talk, he will illustrate the hilarity of his life as a data scientist in the age of LLMs and his non-existent comfort zone, providing good sequels can exist

Apollo