PyData Amsterdam 2025

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.

Wednesday, Sept. 24, 2025

Thursday, Sept. 25, 2025

Friday, Sept. 26, 2025

08:30

08:30

30min

Registration

Katherine Johnson @ TNW City

08:30

30min

Registration

Katherine Johnson @ TNW City

08:30

30min

Registration

Margaret Hamilton @ TNW City

08:30

30min

Registration

Margaret Hamilton @ TNW City

09:00

Meet Docling: The “Pandas” for document AI

Mingxuan Zhao, Panos Vagenas

A workshop session to show you the basics on how to use Docling to enhance document ingestion in your AI workflow.

Margaret Hamilton @ TNW City

Next-Level Retrieval in RAG: Techniques and Tools for Enhanced Performance

Mahima Arora, Aarti Jha

Retrieval-Augmented Generation (RAG) systems rely heavily on the quality of the retrieval process to generate accurate and contextually relevant outputs. In this 90-minute tutorial, we explore practical techniques to enhance retrieval across three key stages: pre-retrieval, mid-retrieval, and post-retrieval. Participants will learn how to optimize data preparation, query strategies, reranking, and evaluation to significantly improve the performance of RAG systems. A real-world case study will guide attendees through implementing these methods in a complete retrieval workflow.

Katherine Johnson @ TNW City

10:50

Building AI Agents With Observability Tooling in PyCharm

Yaroslav Sokolov, Lenar Sharipov

As AI-powered agents and workflows grow in complexity, understanding their internal behavior becomes critical. In this hands-on workshop, you’ll build an agent and explore how observability tooling in PyCharm can help you trace, inspect, and debug its behavior at every stage – without having to leave the IDE.

Katherine Johnson @ TNW City

Understand your data with Knowledge Graphs

Martin O'Hanlon

Graph databases give the same importance to relationships as they do to data. Knowledge graphs allows you to uncover insights in your data and efficiently explore the relationships in your data.

Margaret Hamilton @ TNW City

12:20

12:20

60min

Lunch

Katherine Johnson @ TNW City

12:20

60min

Lunch

Katherine Johnson @ TNW City

12:20

60min

Lunch

Margaret Hamilton @ TNW City

12:20

60min

Lunch

Margaret Hamilton @ TNW City

13:20

Bridging the Gap: Building Robust, Tool-Integrated LLM Applications with the Model Context Protocol

Adam Hill, Shourya Sharma

Large Language Models (LLMs) are unlocking transformative capabilities — but integrating them into complex, real-world applications remains a major challenge. Simple prompting isn’t enough when dynamic interaction with tools, structured data, and live context is required. This workshop introduces the Model Context Protocol (MCP), an emerging open standard designed to simplify and standardise this integration. Aimed at forward-thinking developers and technologists, this hands-on session will equip participants with practical skills to build intelligent, modular, and extensible LLM-native applications using MCP.

Katherine Johnson @ TNW City

Grounding LLMs on Solid Knowledge: Assessing and Improving Knowledge Graph Quality in GraphRAG Applications

Panos Alexopoulos

Graph-based Retrieval-Augmented Generation (GraphRAG) enhances large language models (LLMs) by grounding their responses in structured knowledge graphs, offering more accurate, domain-specific, and explainable outputs. However, many of the graphs used in these pipelines are automatically generated or loosely assembled, and often lack the semantic structure, consistency, and clarity required for reliable grounding. The result is misleading retrieval, vague or incomplete answers, and hallucinations that are difficult to trace or fix.

This hands-on tutorial introduces a practical approach to evaluating and improving knowledge graph quality in GraphRAG applications. We’ll explore common failure patterns, walk through real-world examples, and share a reusable checklist of features that make a graph “AI-ready.” Participants will learn methods for identifying gaps, inconsistencies, and modeling issues that prevent knowledge graphs from effectively supporting LLMs, and apply simple fixes to improve grounding and retrieval performance in their own projects.

Margaret Hamilton @ TNW City

15:10

Event-Driven AI Agent Workflows with Dapr

Dana Arsovska, Marc Duiker

As AI systems evolve, the need for robust infrastructure increases. Enter Dapr Agents: an open-source framework for creating production-grade AI agent systems. Built on top of the Dapr framework, Dapr Agents empowers developers to build intelligent agents capable of collaborating in complex workflows - leveraging Large Language Models (LLMs), durable state, built-in observability, and resilient execution patterns. This workshop will walk through the framework’s core components and through practical examples demonstrate how it solves real-world challenges.

Katherine Johnson @ TNW City

Listen: A Practical Introduction to Data Sonification

Tomek Roszczynialski

Sonification–using sound to represent data–is a niche technique for exploring complex patterns, expanding the sensory dimensions of data analysis, and discovering musical ideas that are otherwise inaccessible.

In this hands-on session, participants will learn the ins and outs of building sonification pipelines through practical examples with data from healthcare and physics. We’ll also cover key software design considerations for creating flexible and expressive systems that map data into sound. Whether you're a developer, data scientist, researcher, educator, or artist, this session will help you listen to your data.

Margaret Hamilton @ TNW City

08:30

08:30

30min

Registration and Breakfast

Apollo

08:30

30min

Registration and breakfast

Voyager

08:30

30min

Registration and breakfast

Nebula

09:00

Opening notes

09:30

The agentification of software (has a UX problem)

Demetrios Brinkmann

The "agentification" of software promises a future where we simply tell a machine what we want, and it handles the rest. We have all felt that magic when vibe coding. So why can't every interaction with machines be that magical?

Well, what if the dominant chat-based interface is a dead end?

This talk explores the significant UX challenges of agentification. We’ll discuss why chat is an over-promising and under-delivering medium, and how we're losing the rich, high-bandwidth context of human communication.

10:20

10:20

15min

Coffee break

Apollo

10:20

15min

Coffee break

Apollo

10:20

15min

Coffee break

Voyager

10:20

15min

Coffee break

Voyager

10:20

15min

Coffee break

Nebula

10:20

15min

Coffee break

Nebula

10:35

Diversity Isn’t a Buzzword, It’s a Business Case

Thirteen years ago, I walked into my first programming class as one of a few women among seventy men and the numbers haven’t shifted much despite all the bootcamps and “Girls Who Code” posters promised. Today, women still make up under 30% of the tech workforce and even fewer in leadership. This talk blends personal experience and data to explore why that matters: companies with more women in leadership perform better, women are built to thrive under pressure, and when women are missing from tech, their perspectives are missing from the data too. Diversity isn’t charity or PR, it's how we build better systems.

Large-Scale Video Intelligence

Irene Donato, Antonino Ingargiola

The explosion of video data demands search beyond simple metadata. How do we find specific visual moments, actions, or faces within petabytes of footage? This talk dives into architecting a robust, scalable multi-modal video search system.
We will explore an architecture combining efficient batch preprocessing for feature extraction (including person detection, face/CLIP-style embeddings) with optimized vector database indexing. Attendees will learn practical strategies for managing massive datasets, optimizing ML inference (e.g., lightweight models, specialized runtimes), and bridging pre-computed indexes with real-time analysis for deeper insights. This session is for data scientists, ML engineers, and architects looking to build sophisticated video understanding capabilities.

Audience: Data Scientists, Machine Learning Engineers, Data Engineers, System Architects.

Takeaway: Attendees will learn architectural patterns and practical techniques for building scalable multi-modal video search systems, including feature extraction, vector database utilization, and ML pipeline optimization.

Background Knowledge: Familiarity with Python, core machine learning concepts (e.g., embeddings, classification), and general data processing pipelines is beneficial. Experience with video processing or computer vision is a plus but not strictly required.

Should Captain America Still Host Your Data? A Call for Open, EU-Based Data Platforms.

Manuel Spierenburg

When you store data in the cloud, do you know who really controls it? In an era of increasing geopolitical tension and growing awareness around digital sovereignty, Dutch research institutes have already begun repatriating sensitive data from US servers to Dutch-controlled storage. This talk explores the hidden risks behind common cloud choices, from legal access by foreign governments to the ethical implications of supporting politically active tech giants. We’ll look at what it means to own your data, how regional storage might not be enough, and what it takes to build an EU-hosted, open-source data platform stack. If you’re a data engineer, architect, or technology leader who cares about privacy, control, and sustainable infrastructure, this talk will equip you with the insight—and motivation—to make different choices.

Uncertainty Unleashed: Wrapping Your Predictions in Honesty with Conformal Prediction

Konstantinos Tsoumas

There are a lot of models working in production as you're reading this. Lots of them are giving uncalibrated outputs without being explicit on how much one can trust the result. Especially when it comes to imbalanced datasets.

More so, relying on biased estimates can lead to overly aggressive decisions. In this hands‑on talk, we’ll demystify conformal methods using MNIST—the world’s favorite handwritten‑digit playground (to make the talk more fun & interactive)- with two goals in mind: explain & prove what an unbiased guarantee is and how it can be calculated but also why should you care and why does it matter so much. Attendees may leave equipped with: uncertainty guarantee understanding in classification, identify common pitfalls that lead to biased uncertainty estimates, how to apply it (even in difficult contexts like imbalanced datasets - an example will be given).

11:20

Actionable Techniques for Finding Performance Regressions

Thijs Nieuwdorp, Jeroen Janssens

Ever been burned by a mysterious slowdown in your data pipeline? In this session, we'll reveal how a stealthy performance regression in the Polars DataFrame library was hunted down and squashed. Using git bisect, Bash scripting, and uv, we automated commit compilation and benchmarking across two repos to pinpoint a commit that degraded multi-file Parquet loading. This led to challenging assumptions and rethinking performance monitoring for the Python data science library Polars.

Causal Inference Framework for incrementality : A Case Study at Booking to estimate incremental CLV due to App installs

Netesh, Nazlı Alagöz

This talk dives into the challenge of measuring the causal impact of app installs on customer loyalty and value, a question at the heart of data-driven marketing. While randomized controlled trials are the gold standard, they’re rarely feasible in this context. Instead, we’ll explore how observational causal inference methods can be thoughtfully applied to estimate incremental value with careful consideration of confounding, selection, and measurement biases.
This session is designed for data scientists, marketing analysts, and applied researchers with a working knowledge of statistics and causal inference concepts. We’ll keep the tone practical and informative, focusing on real-world challenges and solutions rather than heavy mathematical derivations.

Attendees will learn:
* How to design robust observational studies for business impact
* Strategies for covariate selection and bias mitigation
* The use of multiple statistical and design-based causal inference approaches
* Methods for validating and refuting causal claims in the absence of true randomization
We’ll share actionable insights, code snippets, and a GitHub repository with example workflows so you can apply these techniques in your own organization. By the end of the talk, you’ll be equipped to design more transparent and credible causal studies-and make better decisions about where to invest your marketing dollars.

Requirements: A basic understanding of causal inference and Python is recommended. Materials and relevant links will be shared during the session

Counting Groceries with Computer Vision: How Picnic Tracks Inventory Automatically

In this talk, we'll share how we're using computer vision to automate stock counting, right on the conveyor belt. We'll discuss the challenges we've faced with the hardware, software, and GenAI components, and we'll also review our own benchmark results for the various state-of-the-art models. Finally, we'll cover the practical aspects of GenAI deployment, including prompt optimization, preventing LLM "yapping," and creating a robust feedback loop for continuous improvement.

Potato breeding using image analysis in a production setting

Dick Abma, Rik Nuijten

The scale-up company Solynta focuses on hybrid potato breeding, which helps achieve improvements in yield, disease resistance, and climate adaptation. Scientific innovation is part of our core business. Plant selections are highly data-driven, involving, for example, drone observations and genetic data. Minimal time-to-production for new ideas is essential, which is facilitated by our custom AWS devops platform. This platform focusses on automation and accessible data storage.

In this talk, we introduce how computer vision (YOLO and SAM modelling) enables monitoring traits of plants in the field, and how we operate these models. This further entails:
• Our experience from training and evaluating models on drone images
• Trade-offs selecting AWS services, Terraform modules and Python packages for automation and robustness
• Our team setup that allows IT specialists and biologists to work together effectively

The talk will provide practical insights for both data scientists and DevOps engineers. The main takeaways are that object detection and segmentation from drone maps, at scale, are achievable for a small team. Furthermore, with the right approach, you can standardise a DevOps platform to let operations and developers work together.

12:05

Flip the Plan: Fast-Track Your AI/ML Model Integration with a Back-to-Front Implementation Strategy

Florenz Hollebrandse

"How quickly will you be able to get this model into production?" is a common question in analytical projects. Often, this is the first time anyone considers the complexities of deploying models within enterprise systems.

This talk introduces an approach to enhance the success rate of complex AI/ML integration projects while reducing time-to-market. Using examples from global banks J.P. Morgan and ING, we will demonstrate team organisation and engineering patterns to achieve this.

This talk is ideal for data scientists, engineers, and product managers interested in adopting an efficient Model Development Lifecycle (MDLC).

Formula 1 goes Bayesian: Time Series Decomposition with PyMC

Wesley Boelrijk

Forecasting time series can be messy, data is often missing, noisy, or full of structural changes like holidays, outliers, or evolving patterns. This talk shows how to build interpretable time series decomposition models using PyMC, a modern probabilistic programming library.

We’ll break time series into trend, seasonality, and noise components using engineered time features (e.g., Fourier and Radial Basis Functions). You’ll also learn how to model correlated series using hierarchical priors, letting multiple time series "learn from each other." As a case study, we’ll analyze Formula 1 lap time data to compare drivers and explore performance consistency using Bayesian posteriors.

This is a hands-on, code-first talk for data scientists, ML engineers, and researchers curious about Bayesian modeling (or Formula 1). Familiarity with Python and basic statistics is helpful, but no deep knowledge of Bayes is required.

From pixel to predictions: A journey through our CT image pipeline in pig breeding using POSIT

Lisette van der Zande

How do you turn a CT scan of a pig into usable data for large-scale genetic research? At Topigs Norsvin, we scan 10,000 male pigs each year using high-resolution CT imaging. This allows us to look inside the animals and assess carcass quality, muscle composition, and indicators of health. We use this data to inform selection decisions and improve the accuracy of our breeding program. In this talk, I'll walk you through the journey of CT data: from scan acquisition and processing to how we extract traits and integrate them into the breeding program. A key part of this process is POSIT, a lightweight project structure that helps us manage complexity, ensure reproducibility and scale our pipelines effectively. While the biological context is specific, the data challenges are familiar to any data professional.

GenAI governance in practice: patterns, pitfalls & strategies across tools and industries

Maarten de Ruiter

Governing generative AI systems presents unique challenges, particularly for teams dealing with diverse GenAI subdomains and rapidly changing technological landscapes. In this talk, Maarten de Ruiter, Data Scientist at Xomnia, shares practical insights drawn from real-world GenAI use-cases. He will highlight essential governance patterns, address common pitfalls, and provide actionable strategies for teams utilizing both open-source tools and commercial solutions. Attendees will gain concrete recommendations that work in practice, informed by successes (and failures!) across multiple industries

12:40

12:40

60min

Lunch break

Apollo

12:40

60min

Lunch break

Apollo

12:40

60min

Lunch break

Voyager

12:40

60min

Lunch break

Voyager

12:40

60min

Lunch break

Nebula

12:40

60min

Lunch break

Nebula

13:40

Context is King: Evaluating Long Context vs. RAG for Data Grounding

Bauke Brenninkmeijer

Grounding Large Language Models in your specific data is crucial, but notoriously challenging. Retrieval-Augmented Generation (RAG) is the common pattern, yet practical implementations are often brittle, suffering from poor retrieval, ineffective chunking, and context limitations, leading to inaccurate or irrelevant answers. The emergence of massive context windows (1M+ tokens) seems to offer a simpler path – just put all your data in the prompt! But does it truly solve the "needle in a haystack" problem, or introduce new challenges like prohibitive costs and information getting lost in the middle? This talk dives deep into the engineering realities. We'll dissect common RAG failure modes, explore techniques for building robust RAG systems (advanced retrieval, re-ranking, query transformations), and critically evaluate the practical viability, costs, and limitations of leveraging long context windows for complex data tasks in Python. Leave understanding the real trade-offs to make informed architectural decisions for building reliable, data-grounded GenAI applications.

Designing tests for ML libraries – lessons from the wild

Sayak Paul, Benjamin Bossan

In this talk, we will cover how to write effective test cases for machine learning (ML) libraries that are used by hundreds of thousands of users on a regular basis. Tests, despite their well-established need for trust and foolproofing, often get less prioritized. Later, this can wreak havoc on massive codebases, with a high likelihood of introducing breaking changes and other unpleasant situations. This talk deals with our approach to testing our ML libraries, which serve a wide user base. We will cover a wide variety of topics, including the mindset and the necessity of minimal-yet-sufficient testing, all the way up to sharing some practical examples of end-to-end test suites.

Leading through the GenAI hype cycle: the good, the bad, and the ugly

Omar Hommos, Judith Redi, Ninghang Hu

Leaders operate across three dimensions: people, business, and technology. A generational shockwave like GenAI has large-scale and fast impact (be it true or perceived impact) on these three dimensions.

We leaders then face a sprint of interesting challenges like:

How to determine what value of this technology is currently underestimated vs overestimated, and how does this change in the future?
How do we contribute to the larger leadership team across different skillsets (sales, product, etc) in the company, being the subject matter experts on this topic?
How do we steer through the learning curve, for both the individual contributors in the team, and the wider company?

And few more similar challenges!

Join us for a nice panel discussion on this topic.

Streamlining data pipeline development with Ordeq

Niels Neerhoff, Simon Brugman

In this talk, we will introduce Ordeq, a cutting-edge data pipeline development framework used by data engineers, scientists and analysts across ING. Ordeq helps you modularise pipeline logic and abstract IO, elevating projects from proof-of-concepts to maintainable production-level applications. We will demonstrate how Ordeq integrates seamlessly with popular data processing tools like Spark, Polars, Matplotlib, DSPy, and orchestration tools such as Airflow. Additionally, we showcase how you can leverage Ordeq on public cloud offering like GCP. Ordeq has 0 dependencies and is available under MIT license.

14:40

Continuous monitoring of model drift in the financial sector

Denis Gaitan, Agustin Iniguez

In today’s financial sector, the continuous accuracy and reliability of machine learning models are crucial for operational efficiency and effective risk management. With the rise of MLOps (Machine Learning Operations), automating monitoring mechanisms has become essential to ensure model performance and compliance with regulations. This presentation introduces a method for continuous monitoring of model drift, highlighting the benefits of automation within the MLOps framework. This topic is particularly interesting because it addresses a common challenge in maintaining model performance over time and demonstrates a practical solution that has been successfully implemented in the bank.

This talk is aimed at data scientists, machine learning engineers, and MLOps practitioners who are interested in automating the monitoring of machine learning models. Attendees will be guided on how to continuous monitor model drift within the MLOps framework. They will understand the benefits of automation in this context, and gain insights into MLOps best practices. A basic understanding of MLOps principles, and statistical techniques for model evaluation will be helpful but not strictly needed.

The presentation will be an informative talk with a focus on the design and implementation. It will include some mathematical concepts but will primarily be demonstrating real-world applications and best practices. At the end we encourage you to actively monitor model drift and automate your monitoring processes to enhance model accuracy, scalability, and compliance in your organizations.

Microlog: Explain Your Python Applications with Logs, Graphs, and AI

Microlog is a lightweight continuous profiler and logger for Python that helps developers understand their applications through interactive visualizations and AI-powered insights. With extremely low overhead and a 100% Python stack, it makes it easy to trace performance issues, debug unexpected behavior, and gain visibility into production systems.

No labels? No problem! - Hunting Fraudsters with Minimal Labels and Maximum ML

Jaap Stefels, Itzel Belderbos

Card testing is one of the largest growing fraud problems within the payments landscape, with fraudsters launching millions of attempts globally each month. These attacks can cost companies thousands of euros in lost revenue and lead to the distribution of private card details. Detecting this type of fraud is extremely difficult without confirmed labels to train standard supervised ML classifiers. In this talk, we’ll describe how we built a production-ready ML model that now processes hundreds of transactions per second and share the key take-aways from our journey.

What Works: Practical Lessons in Applying Privacy-Enhancing Technologies (PET) in Data Science

Yuliya Sapega, Joanna Pasiarska

Privacy-Enhancing Technologies (PETs) promise to bridge the gap between data utility and privacy — but how do they perform in practice? In this talk, we’ll share real-world insights from our hands-on experience testing and implementing leading PET solutions across various data science use cases.
We explored tools such as differential privacy libraries, homomorphic encryption frameworks, federated learning, multi-party computation, etc. Some lived up to their promise — others revealed critical limitations.
You’ll walk away with a clear understanding of which PET solutions work best for which types of data and analysis, what trade-offs to expect, and how to set realistic goals when integrating PETs into your workflows. This session is ideal for data professionals and decision-makers who are navigating privacy risks while still wanting to innovate responsibly.

15:20

Measure twice, deploy once: Evaluation of retrieval systems

Paul verhaar, Marten koopmans

Improving retrieval systems—especially in RAG pipelines—requires a clear understanding of what’s working and what isn’t. The only scalable way to do that is through meaningful metrics. In this talk, we share insights from building a platform-agnostic search and retrieval product, and how we balance performance against cost. Bigger models often give better results… but at what price? We explain how to assess what’s “good enough” and why the choice of benchmark really matters.

Quiet on Set: Building an On-Air Sign with Open Source Technologies

Using a Raspberry Pi and a powerful trio of open-source technologies—Apache Kafka, Apache Flink, and Apache Iceberg—learn how to build a custom on-air sign to signal when you're on a call and discover how this same scaffolding can be scaled for millions of users.

The Gentle Monorepo: Ship Faster and Collaborate Better

Monorepos promise faster development and smoother cross-team collaboration, but they often seem intimidating, requiring major tooling, buy-in, and process changes. This talk shows how Dexter gradually introduced a Python monorepo by combining a few lightweight tools with a pragmatic, trust-based approach to adoption. The result is that we can effectively reuse components across our various energy forecasting and trade optimization products. We iterate quicker on bringing our research to production, which benefits our customers and supports the renewable energy transition. After this talk, you’ll walk away with a practical blueprint for introducing a monorepo in your context, without requiring heavy up-front work.

16:00

16:00

20min

Snack break

Apollo

16:00

20min

Snack break

Apollo

16:00

20min

Snack break

Voyager

16:00

20min

Snack break

Voyager

16:00

20min

Snack break

Nebula

16:00

20min

Snack break

Nebula

16:20

Ethics is Not a Feature: Rethinking AI from the Ground Up

Dr. Maria Börner

Ethics is often treated like a product feature—something to be added at the end, polished for compliance, or marketed for trust. But what if that mindset is exactly what’s holding us back?
In this keynote, we’ll challenge the idea that ethics is optional or external to the development process. We’ll explore how ethical blind spots in AI systems—from biased models to black-box decisions to unsustainable compute—aren’t just philosophical dilemmas, but human failures with real-world consequences.
You’ll learn how to spot ethical risks before they become failures, and discover practical tools and mindsets to build AI that earns trust—without compromising on innovation. From responsible data practices to transparency techniques and green AI strategies, we’ll connect the dots between values and code.
This isn’t just a lecture—it’s a call to rethink how we build the future of AI—together.

17:10

Closing notes

17:20

17:20

55min

Social Event

Apollo

17:20

55min

Social Event

Apollo

17:20

55min

Social Event

Voyager

17:20

55min

Social Event

Voyager

17:20

55min

Social Event

Nebula

17:20

55min

Social Event

Nebula

08:30

08:30

30min

Registration and breakfast

Apollo

08:30

30min

Registration and breakfast

Apollo

08:30

30min

Registration and breakfast

Voyager

08:30

30min

Registration and breakfast

Voyager

08:30

30min

Registration and breakfast

Nebula

08:30

30min

Registration and breakfast

Nebula

09:00

Image processing, artificial intelligence, and autonomous systems

In this talk, an overview of the field of image processing and the impact of artificial intelligence on this field are shown. Starting from the different tasks that can be performed with image processing, solutions using different AI technologies are shown, including the use of generative AI. Finally, the effect of AI for autonomous systems, and the challenges that are faced are discussed.

09:50

09:50

15min

Coffee break

Apollo

09:50

15min

Coffee break

Apollo

09:50

15min

Coffee Break

Voyager

09:50

15min

Coffee Break

Voyager

09:50

15min

Coffee break

Nebula

09:50

15min

Coffee break

Nebula

10:05

Model Context Protocol: Principles and Practice

Fabio Lipreri, Gabriele Orlandi

Large‑language‑model agents are only as useful as the context and tools they can reach.

Anthropic’s Model Context Protocol (MCP) proposes a universal, bidirectional interface that turns every external system—SQL databases, Slack, Git, web browsers, even your local file‑system—into first‑class “context providers.”

In just 30 minutes we’ll step from high‑level buzzwords to hands‑on engineering details:

How MCP’s JSON‑RPC message format, streaming channels, and version‑negotiation work under the hood.
Why per‑tool sandboxing via isolated client processes hardens security (and what happens when an LLM tries rm ‑rf /).
Techniques for hierarchical context retrieval that stretch a model’s effective window beyond token limits.
Real‑world patterns for accessing multiple tools—Postgres, Slack, GitHub—and plugging MCP into GenAI applications.

Expect code snippets and lessons from early adoption.

You’ll leave ready to wire your own services into any MCP‑aware model and level‑up your GenAI applications—without the N×M integration nightmare.

Optimize the Right Thing: Cost-Sensitive Classification in Practice

Shimanto Rahman

Not all mistakes in machine learning are equal—a false negative in fraud detection or medical diagnosis can be far costlier than a false positive. Cost-sensitive learning helps navigate these trade-offs by incorporating error costs into the training process, leading to smarter decision-making. This talk introduces Empulse, an open-source Python package that brings cost-sensitive learning into scikit-learn. Attendees will learn why standard models fall short in cost-sensitive scenarios and how to build better classifiers with Scikit-Learn and Empulse.

Untitled13.ipynb

Vincent Warmerdam

For well over a decade, Python notebooks revolutionized our field. They gave us so much creative freedom and dramatically lowered the entry barrier for newcomers. Yet despite all this ... it has been a decade! And the notebook is still in roughly the same form factor.

So what if we allow ourselves to rethink notebooks ... really rethink it! What features might we come up with? Can we make the notebook understand datasources? What about LLMs? Can we generate widgets on the fly? What if we make changes to Python itself?

This presentation will be a stream of demos that help paint a picture of what the future might hold. I will share my latest work in the anywidget/marimo ecosystem as well as some new hardware integrations.

The main theme that I will work towards: if you want better notebooks, reactive Python might very well be the future.

11:05

Data that Keeps Our Energy in Balance - From churn prediction with deep learning to real-time trading systems

Pablo Estevez, Manolis Manousogiannis

This talk explores how data science helps balance energy systems in the face of demand volatility, generation volatility, and the push for sustainability. We’ll dive into two technical case studies: churn prediction using survival models, and the design of a high-availability real-time trading system on Databricks. These examples illustrate how data can support operational resilience and sustainability efforts in the energy sector.

Declarative Feature Engineering: Bridging Spark and Flink with a Unified DSL

Miguel Leite, Vitalii Zhebrakovskyi

Building ML features at scale shouldn’t require every ML Scientist to become an expert in Spark or Flink. At Adyen, the Feature Platform team built a Python-based DSL that lets data scientists define features declaratively — while automatically generating the necessary batch or real-time pipelines behind the scenes.

Detection of Unattended Objects in Public Spaces using AI

This talk presents an end-to-end solution for detecting unattended objects in public transport hubs to enhance social security. The project, developed in a three-week challenge, focuses on proactively identifying unattended items using existing camera infrastructure. We will cover the entire pipeline, from data anonymization and preprocessing to building a data labeling platform, object detection with YOLO, and tracking objects over time. The presentation will also discuss the evaluation of the system.

Scaling Trust: A practical guide on evaluating LLMs and Agents

George Chouliaras, Antonio Castelli

Recently, the integration of Generative AI (GenAI) technologies into both our personal and professional lives has surged. In most organizations, the deployment of GenAI applications is on the rise, and this trend is expected to continue in the foreseeable future. Evaluating GenAI systems presents unique challenges not present in traditional ML. The main peculiarity is the absence of ground truth for textual metrics such as: text clarity, location extraction accuracy, factual accuracy and so on. Nevertheless the non-negligible model serving cost demands an even more thorough evaluation of the system to be deployed in production.

Defining the metric ground truth is a costly and time consuming process requiring human annotation. To address this, we are going to present how to evaluate LLM-based applications by leveraging LLMs themselves as evaluators. Moreover we are going to outline the complexities and evaluation methods for LLM-based Agents which operate with autonomy and present further evaluation challenges. Lastly, we will explore the critical role of evaluation in the GenAI lifecycle and outline the steps taken to integrate these processes seamlessly.

Whether you are an AI practitioner, user or enthusiast, join us to gain insights into the future of GenAI evaluation and its impact on enhancing application performance.

11:50

Composable Pipelines for ML: Automating Feature Engineering with Hopsworks’ Brewer

Javier de la Rúa Martínez

Operationalizing ML isn’t just about models — it’s about moving and engineering data. At Hopsworks, we built a composable AI pipeline builder (Brewer) based on two principles: Tasks and Data Sources. This lets users define workflows that automatically analyse, clean, create and update feature groups, without glue code or brittle scheduling logic.

In this talk, we’ll show how Brewer drives the automation of feature engineering, enabling reproducible, declarative pipelines that respond to changes in upstream data. We’ll explore how this fits into broader ML workflows, from ingestion to feature materialization, and how it integrates with warehouses, streams, and file-based systems.

How to Keep Your LLM Chatbots Real: A Metrics Survival Guide

In this brave new world of vibe coding and YOLO-to-prod mentality, let’s take a step back and keep things grounded (pun intended). None of us would ever deploy a classical ML model to production without clearly defined metrics and proper evaluation, so let's talk about methodologies for measuring performance of LLM-powered chatbots. Think of retriever recall, answer relevancy, correctness, faithfulness and hallucination rates. With the wild west of metric standards still in full swing, I’ll guide you through the challenges of curating a synthetic test set, and selecting suitable metrics and open-source packages that help evaluating your use case. Everything is possible, from simple LLM-as-a-judge approaches like those inherent to many packages like MLFLow now up to complex multi-step quantification approaches with Ragas. If you work in the GenAI space or with LLM-powered chatbots, this session is for you! Prior or background knowledge is of advantage, but not required.

Kafka Internals I Wish I Knew Sooner: The Non-Boring Truths

Dima Baranetskyi

Most of us start with Kafka by building a simple producer/consumer demo. It just works — until it doesn’t. Suddenly, disk space isn’t freed up after data “expires,” rebalances loop endlessly during deploys, and strange errors about missing leaders clog your logs.
In the panic, we dive into Kafka’s ocean of config options — hoping something will stick. Sound familiar?

This talk is a collection of hard-won lessons — not flashy tricks, but the kind of insights you only gain after operating Kafka in production for years. You’ll walk away with mental models that make Kafka’s internal behavior more predictable and less surprising.

We’ll cover:
- Storage internals: Why expired data doesn’t always free space — and how Kafka actually reclaims disk
- Transactions & delivery semantics: What “exactly-once” really means, and when it silently downgrades
- Consumer group rebalancing: Why rebalances loop, and how the controller’s hidden behavior affects them

If you’ve used Kafka — or plan to — these insights will save you hours of frustration and debugging.
A basic understanding of partitions, replication, and Kafka’s general architecture will help get the most out of this session.

Optimal Observability: Partitioning Data into Time-Series for Enhanced Anomaly Detection and Improved Monitoring Coverage

This talk presents a principled methodology for partitioning item-level data into homogeneous time-series, with the objective of maximizing monitoring coverage and improving the detection of anomalies and drifts. We discuss the theoretical underpinnings of clustering algorithms for this task and describe practical algorithms enabling efficient search for optimal partitioning. We exemplify our approach with a real-world application in large-scale monitoring environments from the online payment domain.

12:25

12:25

60min

Lunch

Apollo

12:25

60min

Lunch

Apollo

12:25

60min

Lunch

Voyager

12:25

60min

Lunch

Voyager

12:25

60min

Lunch

Nebula

12:25

60min

Lunch

Nebula

13:20

Open source sprints - PyIceberg & PyMC

Fokko Driesprong, Rob Zinkov

Also this year, at our 10 year anniversary edition of PyData Amsterdam, we’ll host open source sprints! ️ Our open source sprints this year will be 2 sessions in parallel, with leading open source contributors Fokko Driesprong and Rob Zinkov of the respective packages PyIceberg and PyMC.

13:25

Evaluating the alignment of LLMs to Dutch societal values

Iva Gornishka, Laurens Samson

The City of Amsterdam is researching the responsible adoption of Large Language Models (LLMs) by evaluating their performance, environmental impact, and alignment with human values. In this talk, we will share how we develop tailored benchmarks and a dedicated assessment platform to raise awareness and guide responsible implementation.

Kickstart Your Probabilistic Forecasting with Level Set and Quantile Regression Forests

Inge van den Ende

Probabilistic forecasting is essential, but choosing the right method is tricky. This talk introduces two lesser-known models — Level Set Forecaster and Quantile Regression Forest — that help you kickstart probabilistic forecasting without unnecessary complexity.

Searching for My Next Chart

Muhammad Chenariyan Nakhaee

Abstract

As a data visualization practitioner, I frequently draw inspiration from the diverse and rapidly expanding community, particularly through challenges like #TidyTuesday. However, the sheer volume of remarkable visualizations quickly overwhelmed my manual curation methods—from Pinterest boards to Notion pages. This created a significant bottleneck in my workflow, as I found myself spending more time cataloging charts than actively creating them.

In this talk, I will present a RAG (Retrieval Augmented Generation) based retrieval system that I designed specifically for data visualizations. I will detail the methodology behind this system, illustrating how I addressed my own workflow inefficiencies by transforming a dispersed collection of charts into a semantically searchable knowledge base. This project serves as a practical example of applying advanced AI techniques to enhance creative technical work, demonstrating how a specialized retrieval system can significantly improve the efficiency and quality of data visualization creation process.

14:10

Help! There Are Humans in My Data!

Marysia Winkels, Isabelle Donatz-Fest

Good quality data is the basis for high quality models and valuable data insights. But isn't it annoying how often your data is riddled with those pesky humans? Human involvement in data creation often introduces errors, misunderstandings, and biases that can compromise data integrity. This talk will explore how human factors influence the data creation process and what we as data professionals can do to account for this in our data interpretation and usage.

Is Prompt Engineering Dead? How Auto-Optimization is Changing the Game

Iryna Kondrashchenko, Oleh Kostromin

The rise of LLMs has elevated prompt engineering as a critical skill in the AI industry, but manual prompt tuning is often inefficient and model-specific. This talk explores various automatic prompt optimization approaches, ranging from simple ones like bootstrapped few-shot to more complex techniques such as MIPRO and TextGrad, and showcases their practical applications through frameworks like DSPy and AdalFlow. By exploring the benefits, challenges, and trade-offs of these approaches, the attendees will be able to answer the question: is prompt engineering dead, or has it just evolved?

Orchestrating success: How Vinted standardizes large-scale, decentralized data pipelines

Rodrigo Loredo, Oscar Ligthart

At Vinted, Europe’s largest second-hand marketplace, over 20 decentralized data teams generate, transform, and build products on petabytes of data. Each team utilizes their own tools, workflows, and expertise. Coordinating data pipeline creation across such diverse teams presents significant challenges. These include complex inter-team dependencies, inconsistent scheduling solutions, and rapidly evolving requirements.

This talk is aimed at data engineers, platform engineers, and technical leads with experience in workflow orchestration and will demonstrate how we empower teams at Vinted to define data pipelines quickly and reliably. We will present our user-friendly abstraction layer built on top of Apache Airflow, enhanced by a Python code generator. This abstraction simplifies upgrades and migrations, removes scheduler complexity, and supports Vinted’s rapid growth. Attendees will learn how Python abstractions and code generation can standardize pipeline development across diverse teams, reduce operational complexity, and enable greater flexibility and control in large-scale data organizations. Through practical lessons and real-world examples of our abstraction interface, we will offer insights into designing scheduler-agnostic architectures for successful data pipeline orchestration.

14:55

Real-Time Context Engineering for LLMs

Context engineering has replaced prompt engineering as the main challenge in building agents and LLM applications. Context engineering involves providing LLMs with relevant and timely context data from various data sources, which allows them to make context-aware decisions. The context data provided to the LLM must be produced in real-time to enable it to react intelligently at human perceivable latencies (a second or two at most). If the application takes longer to react, humans would perceive it as laggy and unintelligent.
In this talk, we will introduce context engineering and motivate for real-time context engineering for interactive applications. We will also demonstrate how to integrate real-time context data from applications inside Python agents using the Hopsworks feature store and corresponding application IDs. Application IDs are the key to unlock application context data for agents and LLMs. We will walk through an example of an interactive application (TikTok clone) that we make AI-enabled with Hopsworks.

Resource Monitoring and Optimization with Metaflow

Gergely Daroczi

Metaflow is a powerful workflow management framework for data science, but optimizing its cloud resource usage still involves guesswork. We have extended Metaflow with a lightweight resource tracking tool that automatically monitors CPU, memory, GPU, and more, then recommends the most cost-effective cloud instance type for future runs. A single line of code can save you from overprovisioned costs or painful job failures!

Sieves: Plug-and-Play NLP Pipelines With Zero-Shot Models

Generative models are dominating the spotlight lately - and rightly so. Their flexibility and zero-shot capabilities make it incredibly fast to prototype NLP applications. However, one-shotting complex NLP problems often isn't the best long-term strategy. Decomposing problems into modular, pipelined tasks leads to better debuggability, greater interpretability, and more reliable performance.

This modular pipeline approach pairs naturally with zero- and few-shot (ZFS) models, enabling rapid yet robust prototyping without requiring large datasets or fine-tuning. Crucially, many real-world applications need structured data outputs—not free-form text. Generative models often struggle to consistently produce structured results, which is why enforcing structured outputs is now a core feature across contemporary NLP tools (like Outlines, DSPy, LangChain, Ollama, vLLM, and others).

For engineers building NLP pipelines today, the landscape is fragmented. There’s no single standard for structured generation yet, and switching between tools can be costly and frustrating. The NLP tooling landscape lacks a flexible, model-agnostic solution that minimizes setup overhead, supports structured outputs, and accelerates iteration.

Introducing Sieves: a modular toolkit for building robust NLP document processing pipelines using ZFS models.

15:30

15:30

40min

Snack break

Voyager

15:30

40min

Snack break

Voyager

15:30

40min

Snack break

Nebula

15:30

40min

Snack break

Nebula

15:35

Lightning Talks

-

16:10

Minus Three Tier: Data Architecture Turned Upside Down

Hannes Mühleisen

Every data architecture diagram out there makes it abundantly clear who's in charge: At the bottom sits the analyst, above that is an API server, and on the very top sits the mighty data warehouse. This pattern is so ingrained we never ever question its necessity, despite its various issues like slow data response time, multi-level scaling issues, and massive cost.

But there is another way: Disconnect of storage and compute enables localization of query processing closer to people, leading to much snappier responses, natural scaling with client-side query processing, and much reduced cost.

In this talk, it will be discussed how modern data engineering paradigms like decomposition of storage, single-node query processing, and lakehouse formats enable a radical departure from the tired three-tier architecture. By inverting the architecture we can put user's needs first. We can rely on commoditised components like object store to enable fast, scalable, and cost-effective solutions.

17:00

Conference closing notes

Conference closing notes

17:20

17:20

40min

Social drinks

Apollo

Techie vs Comic: The sequel

A data scientist by day and a standup comedian by night. This was how Arda described himself prior to his critically acclaimed performance about his two identities during PyData 2024, where they merged.

Now he doesn't even know.

After another year of stage performances, awkward LinkedIn interactions and mysterious cloud errors, Arda is back for another tale of absurdity. In this closing talk, he will illustrate the hilarity of his life as a data scientist in the age of LLMs and his non-existent comfort zone, proving good sequels can exist