PyData Eindhoven 2025

Training one model is fun. Running thousands without everything catching fire? That’s the real challenge. In this talk, we’ll show how we — two data scientists turned accidental ML engineers — scaled anomaly detection at Vanderlande. Expect a peek into our orchestration setup, a quick code snippet, a look at our monitoring dashboard and how we scale to a thousand models.

AI/Machine Learning/GenAI

Planck-Bohr

10:00

30min

Developing a Nation-Wide Padel Rating System: A Data-Driven Approach

Max Brouwer

Padel has been one of the fastest-growing sports in the Netherlands in recent years. While it initially benefited from the rating facilities of its ‘big brother’ tennis, the KNLTB decided in 2024 to develop a dedicated, tailor-made rating system for padel, which has been in effect since 2025. The development process involved extensive analyses, simulations, and probability modeling on data from more than 300,000 padel matches, complemented by recommendations from the field.

In this presentation, the audience will be taken through the technical development process, as well as the unique characteristics of padel that were crucial in creating an effective rating system.

Sports Analytics hosted PySport

Ernst-Curie

10:00

30min

Scaling Retail Planning at IKEA: Orchestrating Sales, Fulfillment and Capacity Assessment with Metaflow

Yannick Mariman

At IKEA, retail planning is a complex chain of processes, from sales forecasting to fulfillment and capacity assessment, that involve multiple teams. Each team builds their own predictive models independently, yet their outputs depend on one another to ensure a concise planning chain.

In this talk, we will show how IKEA uses Metaflow, an open-source framework for building and managing real-life ML, to orchestrate and connect the forecasting pipelines for more than thirty countries. We’ll discuss how Metaflow helps align independent teams, improve readability, and enable reproducible workflows and scale.

You will leave with practical approaches for an aligned team workflow and concrete patterns for orchestrating ML/AI pipelines.

Data Engineering

Auditorium

10:30

15min

Coffee Break 15m

Auditorium

10:30

15min

Coffee Break 15m

Ernst-Curie

10:30

15min

Coffee Break 15m

Planck-Bohr

10:45

30min

AI-Powered Web Scraping: From Data Collection to Strategic Insights

Yevhenii

Companies today are hungry for external data to stay competitive, but actually getting and making sense of that data isn’t easy. Standard web scraping often produces messy or incomplete results, and modern anti-bot systems make reliable collection even tougher.

In this talk, I’ll share how pairing Python’s scraping frameworks (like Scrapy, Playwright, and Selenium) with AI/ML can turn raw, unstructured data into clear, actionable insights.

We’ll look at:

1) How to build scrapers that still work in 2025.

2) Ways to use AI to automatically clean, enrich, and classify data.

3) Real-world applications of sentiment analysis for reviews and social media.

4) Case studies showing how SMEs have used these pipelines to sharpen marketing and product strategies.

By the end, you’ll see how to design pipelines that don’t just gather data, but deliver real strategic value. The session will focus on practical Python tools, scalable deployment (Airflow, Kubernetes, cloud platforms), and key lessons learned from hands-on projects at the intersection of scraping and AI.

Yet Another “How to Trust AI”: Embracing Uncertainty with Probabilistic Methods

Albert

Everyone talks about “trustworthy AI,” yet few approaches go beyond good intentions. This talk takes a practical look at why AI systems often fail our trust—and how probabilistic methods can fix that.

We’ll explore how to connect RxInfer, a probabilistic inference engine, with LLM agents through the Model Context Protocol (MCP). MCP provides a simple way for language models to interact with probabilistic reasoning tools, letting them move beyond confident guesses to quantified beliefs.

By embracing uncertainty rather than ignoring it, we can design AI systems that reason more transparently, admit their limits, and make decisions we can actually trust. Expect a blend of conceptual insight, Python demos, and a few honest laughs about the current “AI trust” hype.

AI/Machine Learning/GenAI

Planck-Bohr

10:45

30min

xReceiver: a GNN approach to the evaluation of the decision-making process of passing options in football

Gabriel Masella

The process of decision-making in football is characterized by a complex interplay between spatial positioning, opponent pressure, and player intent. In this research, we introduce xReceiver, a real-time Graph Neural Network (GNN) framework designed to predict the optimal passing target by modeling on-field interactions as dynamic graphs. Each player is represented as a node with positional and contextual features, while potential passing lines form weighted edges characterized by distance, angle, and pressure metrics. We have developed a Message-Passing Neural Network (MPNN) that is trained using a combination of tracking data and event data from professional matches. Our model achieves 65.22% accuracy in identifying the actual chosen receiver and 95.65% accuracy within its top three suggestions. xReceiver further offers quantification of each option's likelihood, threat, and creativity, enabling performance analysts to evaluate over 1,000 passes in seconds.

Sports Analytics hosted PySport

Ernst-Curie

11:15

5min

Coffee Break 5m

Auditorium

11:15

5min

Coffee Break 5m

Ernst-Curie

11:15

5min

Coffee Break 5m

Planck-Bohr

11:20

30min

Efficient Time-Series Forecasting with Thousands of Local Models on Databricks

Daria Mustafina

In industries like energy and retail, forecasting often requires local models when each time series has unique behavior — though training thousands of them can be overwhelming. However, training and managing thousands of such models presents scalability and operational challenges. This talk shows how we scaled local models on Databricks by leveraging the Pandas API on Spark, and shares practical lessons on storage, reuse, and scaling challenges to make this approach efficient when it’s truly needed

From Data Lake Entanglement to Data Mesh Decoupling: Scaling a Self-Service Data Platform

Geert Jongen

Our data platform journey started with a classic data lake — easy to ingest, hard to evolve. As domains scaled, tight coupling across source systems, pipelines, and data products slowed everything down. In this talk, we share how we re-architected toward a domain-oriented data mesh using PySpark, Delta Lake and DQX to achieve true decoupling. Expect practical lessons on designing independent data products, managing lineage and governance, and scaling self-service without chaos.

Identifying playstyles in football through spatial networks

Annemarijn Blom

Breaking away from traditional manual video analysis, this talk introduces a data-driven approach to automatically identify football playstyles in key moments before a shot on goal , using tracking and event data. By applying network science , which studies relationships and interactions within complex systems, we objectively analyze attacking and defensive strategies. Key spatial network metrics are used to reveal diverse playstyles through clustering techniques. The session concludes with insights into the results and possible applications of these findings in football analysis.

Sports Analytics hosted PySport

Ernst-Curie

11:50

60min

Lunch

Auditorium

11:50

60min

Lunch

Ernst-Curie

11:50

60min

Lunch

Planck-Bohr

12:50

30min

Finding trash in waste

Tom Koopen

In the Netherlands plastic waste is often gathered in a dedicated container. These are collected in truck, that are emptied at a transfer station. Here a visual inspection is done to identify stuff that does not belong in this waste. We have researched the automation of this inspection using cameras and vision foundation models. All items in image of the big pile are detected, segmented and classified whether they belong to this waste stream.

AI/Machine Learning/GenAI

Auditorium

12:50

30min

FootballBERT: Encoding player identity in vectors with Transformers.

Achraff ADJILEYE

FootballBERT introduces a new way of representing football players — not as static IDs or statistical aggregates that fluctuate wildly over short periods, but as contextual embeddings learned directly from match data.
Built on a Transformer architecture and trained through a Masked Player Prediction (MPP) objective, FootballBERT captures how a player’s identity emerges from teammates, opponents, and coaches tactical demands — much like BERT learns word meaning from sentences.
Openly released on Hugging Face, FootballBERT is a plug-and-play foundation model whose embeddings can be integrated into any downstream system, paving the way for player-aware analytics across performance modeling, recruitment and prediction.

Sports Analytics hosted PySport

Ernst-Curie

12:50

30min

Scaling Python to thousands of nodes with Ray

Rob de Wit-Liezenga

Python is the language of choice for anything to do with AI and ML. While that has made it easy to write code for one machine, it's much more difficult to run workloads across clusters of thousands of nodes. Ray allows you to do just that. I'll demonstrate how to implement this open source tool with a few lines of code. As a demo project, I'll show how I built a RAG for the Wheel of Time series.

Data Engineering

Planck-Bohr

13:20

5min

Coffee Break 5m

Auditorium

13:20

5min

Coffee Break 5m

Ernst-Curie

13:20

5min

Coffee Break 5m

Planck-Bohr

13:25

30min

Extending SQL Databases with Python

Florents Tselai

What if your database could run Python code inside SQL? In this talk, we’ll explore how to extend popular databases using Python, without needing to write a line of C.

We’ll cover three systems—SQLite, DuckDB, and PostgreSQL—and show how Python can be used in each to build custom SQL functions, accelerate data workflows, and prototype analytical logic. Each database offers a unique integration path:
- SQLite and DuckDB allow you to register Python functions directly into SQL via sqlite3.create_function, making it easy to inject business logic or custom transformations.
- PostgreSQL offers PL/Python, a full-featured procedural language for writing SQL functions in Python. We’ll also touch on advanced use cases, including embedding the Python interpreter directly into a PostgreSQL extension for deeper integration.

By the end of this talk, you’ll understand the capabilities, limitations, and gotchas of Python-powered extensions in each system—and how to choose the right tool depending on your use case, whether you’re analyzing data, building pipelines, or hacking on your own database.

Football is complex, but your code doesn’t have to be — meet DataBallPy and a practical deep dive into pressing

Alexander Oonk, Tygo Nikamp

DataBallPy is an open-source Python package that quickly starts your analysis of a football-related question. In the current talk, we will introduce the core features and functionalities of DataBallPy using code examples with compelling visualisations. The second part of the talk will showcase a practical example of how the Royal Belgian Football Association (RBFA) has used components of DataBallPy to analyse the effectiveness and efficiency of pressuring the opponent in over 200 games. Taken together, this talk will give you a clear starting point of how to start answering your football-related questions.

Sports Analytics hosted PySport

Ernst-Curie

13:25

30min

From €1M License to In-House Success: How We Built a Real-Time Recommendation System and Saved Millions Doing It

ALI KOHAN

When we at Bol decided to personalize campaign banners, we did what many companies do: bought an expensive solution. As a software engineering team with zero data science experience, we integrated a third-party recommender system for €1 million annually, built the cloud infrastructure, and waited for results. After our first season, the data told a harsh truth—the third-party tool wasn't delivering value proportional to its cost. We faced a crossroads: accept mediocrity or build our own solution from scratch, tailored to our requirements and architecture.
We'll walk you through our journey of building a more intelligent and flexible recommendation system from the ground up, and how this journey saved us over a million euros per year. We will share the incremental steps that shaped our journey, alongside the valuable lessons learned along the way

AI/Machine Learning/GenAI

Auditorium

13:55

15min

Coffee break

Auditorium

13:55

15min

Coffee break

Ernst-Curie

13:55

15min

Coffee break

Planck-Bohr

14:10

30min

GPS doesn't work! Can a model alert us before this happens?

Vincenzo Ventriglia

Have you ever happened to use GPS and realised that it is not working properly? The Sun and a Space Weather effect called Travelling Ionospheric Disturbances (TIDs) could be responsible. We will present an explainable TIDs forecasting model, based on CatBoost and using several physical drivers to make forecasts.

AI/Machine Learning/GenAI

Auditorium

14:10

30min

Planning Hockey Careers With Python

Jaroslav Bezdek

How can data science help young athletes navigate their careers? In this talk, I’ll share my experience building a career path planner for aspiring ice hockey players. The project combines player performance data, career path patterns, and predictive modeling to suggest possible development paths and milestones. Along the way, I’ll discuss the challenges of messy sports data and communicating insights in a way that resonates with non-technical users like coaches, parents, and players.

Sports Analytics hosted PySport

Ernst-Curie