PyData Eindhoven 2025

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
09:00
09:00
10min
Opening

Welcome to PyData Eindhoven 2025

Auditorium
09:10
09:10
45min
Opening Keynote

Keynote

Auditorium
09:55
09:55
5min
Coffee Break 5m
Auditorium
09:55
5min
Coffee Break 5m
Ernst-Curie
09:55
5min
Coffee Break 5m
Planck-Bohr
10:00
10:00
30min
Beyond One Model: Scaling, Orchestrating & Monitoring
Azucena Morales, Vi Chu

Training one model is fun. Running thousands without everything catching fire? That’s the real challenge. In this talk, we’ll show how we — two data scientists turned accidental ML engineers — scaled anomaly detection at Vanderlande. Expect a peek into our orchestration setup, a quick code snippet, a look at our monitoring dashboard and how we scale to a thousand models.

AI/Machine Learning/GenAI
Planck-Bohr
10:00
30min
Developing a Nation-Wide Padel Rating System: A Data-Driven Approach
Max Brouwer

Padel has been one of the fastest-growing sports in the Netherlands in recent years. While it initially benefited from the rating facilities of its ‘big brother’ tennis, the KNLTB decided in 2024 to develop a dedicated, tailor-made rating system for padel, which has been in effect since 2025. The development process involved extensive analyses, simulations, and probability modeling on data from more than 300,000 padel matches, complemented by recommendations from the field.

In this presentation, the audience will be taken through the technical development process, as well as the unique characteristics of padel that were crucial in creating an effective rating system.

Sports Analytics hosted PySport
Ernst-Curie
10:00
30min
Scaling Retail Planning at IKEA: Orchestrating Sales, Fulfillment and Capacity Assessment with Metaflow
Yannick Mariman

At IKEA, retail planning is a complex chain of processes, from sales forecasting to fulfillment and capacity assessment, that involve multiple teams. Each team builds their own predictive models independently, yet their outputs depend on one another to ensure a concise planning chain.

In this talk, we will show how IKEA uses Metaflow, an open-source framework for building and managing real-life ML, to orchestrate and connect the forecasting pipelines for more than thirty countries. We’ll discuss how Metaflow helps align independent teams, improve readability, and enable reproducible workflows and scale.

You will leave with practical approaches for an aligned team workflow and concrete patterns for orchestrating ML/AI pipelines.

Data Engineering
Auditorium
10:30
10:30
15min
Coffee Break 15m
Auditorium
10:30
15min
Coffee Break 15m
Ernst-Curie
10:30
15min
Coffee Break 15m
Planck-Bohr
10:45
10:45
30min
AI-Powered Web Scraping: From Data Collection to Strategic Insights
Yevhenii

Companies today are hungry for external data to stay competitive, but actually getting and making sense of that data isn’t easy. Standard web scraping often produces messy or incomplete results, and modern anti-bot systems make reliable collection even tougher.

In this talk, I’ll share how pairing Python’s scraping frameworks (like Scrapy, Playwright, and Selenium) with AI/ML can turn raw, unstructured data into clear, actionable insights.

We’ll look at:

1) How to build scrapers that still work in 2025.

2) Ways to use AI to automatically clean, enrich, and classify data.

3) Real-world applications of sentiment analysis for reviews and social media.

4) Case studies showing how SMEs have used these pipelines to sharpen marketing and product strategies.

By the end, you’ll see how to design pipelines that don’t just gather data, but deliver real strategic value. The session will focus on practical Python tools, scalable deployment (Airflow, Kubernetes, cloud platforms), and key lessons learned from hands-on projects at the intersection of scraping and AI.

Data Engineering
Auditorium
10:45
30min
Yet Another “How to Trust AI”: Embracing Uncertainty with Probabilistic Methods
Albert

Everyone talks about “trustworthy AI,” yet few approaches go beyond good intentions. This talk takes a practical look at why AI systems often fail our trust—and how probabilistic methods can fix that.

We’ll explore how to connect RxInfer, a probabilistic inference engine, with LLM agents through the Model Context Protocol (MCP). MCP provides a simple way for language models to interact with probabilistic reasoning tools, letting them move beyond confident guesses to quantified beliefs.

By embracing uncertainty rather than ignoring it, we can design AI systems that reason more transparently, admit their limits, and make decisions we can actually trust. Expect a blend of conceptual insight, Python demos, and a few honest laughs about the current “AI trust” hype.

AI/Machine Learning/GenAI
Planck-Bohr
10:45
30min
xReceiver: a GNN approach to the evaluation of the decision-making process of passing options in football
Gabriel Masella

The process of decision-making in football is characterized by a complex interplay between spatial positioning, opponent pressure, and player intent. In this research, we introduce xReceiver, a real-time Graph Neural Network (GNN) framework designed to predict the optimal passing target by modeling on-field interactions as dynamic graphs. Each player is represented as a node with positional and contextual features, while potential passing lines form weighted edges characterized by distance, angle, and pressure metrics. We have developed a Message-Passing Neural Network (MPNN) that is trained using a combination of tracking data and event data from professional matches. Our model achieves 65.22% accuracy in identifying the actual chosen receiver and 95.65% accuracy within its top three suggestions. xReceiver further offers quantification of each option's likelihood, threat, and creativity, enabling performance analysts to evaluate over 1,000 passes in seconds.

Sports Analytics hosted PySport
Ernst-Curie
11:15
11:15
5min
Coffee Break 5m
Auditorium
11:15
5min
Coffee Break 5m
Ernst-Curie
11:15
5min
Coffee Break 5m
Planck-Bohr
11:20
11:20
30min
Efficient Time-Series Forecasting with Thousands of Local Models on Databricks
Daria Mustafina

In industries like energy and retail, forecasting often requires local models when each time series has unique behavior — though training thousands of them can be overwhelming. However, training and managing thousands of such models presents scalability and operational challenges. This talk shows how we scaled local models on Databricks by leveraging the Pandas API on Spark, and shares practical lessons on storage, reuse, and scaling challenges to make this approach efficient when it’s truly needed

Data Engineering
Auditorium
11:20
30min
From Data Lake Entanglement to Data Mesh Decoupling: Scaling a Self-Service Data Platform
Geert Jongen

Our data platform journey started with a classic data lake — easy to ingest, hard to evolve. As domains scaled, tight coupling across source systems, pipelines, and data products slowed everything down. In this talk, we share how we re-architected toward a domain-oriented data mesh using PySpark, Delta Lake and DQX to achieve true decoupling. Expect practical lessons on designing independent data products, managing lineage and governance, and scaling self-service without chaos.

Data Engineering
Planck-Bohr
11:20
30min
Identifying playstyles in football through spatial networks
Annemarijn Blom

Breaking away from traditional manual video analysis, this talk introduces a data-driven approach to automatically identify football playstyles in key moments before a shot on goal , using tracking and event data. By applying network science , which studies relationships and interactions within complex systems, we objectively analyze attacking and defensive strategies. Key spatial network metrics are used to reveal diverse playstyles through clustering techniques. The session concludes with insights into the results and possible applications of these findings in football analysis.

Sports Analytics hosted PySport
Ernst-Curie
11:50
11:50
60min
Lunch
Auditorium
11:50
60min
Lunch
Ernst-Curie
11:50
60min
Lunch
Planck-Bohr
12:50
12:50
30min
Finding trash in waste
Tom Koopen

In the Netherlands plastic waste is often gathered in a dedicated container. These are collected in truck, that are emptied at a transfer station. Here a visual inspection is done to identify stuff that does not belong in this waste. We have researched the automation of this inspection using cameras and vision foundation models. All items in image of the big pile are detected, segmented and classified whether they belong to this waste stream.

AI/Machine Learning/GenAI
Auditorium
12:50
30min
FootballBERT: Encoding player identity in vectors with Transformers.
Achraff ADJILEYE

FootballBERT introduces a new way of representing football players — not as static IDs or statistical aggregates that fluctuate wildly over short periods, but as contextual embeddings learned directly from match data.
Built on a Transformer architecture and trained through a Masked Player Prediction (MPP) objective, FootballBERT captures how a player’s identity emerges from teammates, opponents, and coaches tactical demands — much like BERT learns word meaning from sentences.
Openly released on Hugging Face, FootballBERT is a plug-and-play foundation model whose embeddings can be integrated into any downstream system, paving the way for player-aware analytics across performance modeling, recruitment and prediction.

Sports Analytics hosted PySport
Ernst-Curie
12:50
30min
Scaling Python to thousands of nodes with Ray
Rob de Wit-Liezenga

Python is the language of choice for anything to do with AI and ML. While that has made it easy to write code for one machine, it's much more difficult to run workloads across clusters of thousands of nodes. Ray allows you to do just that. I'll demonstrate how to implement this open source tool with a few lines of code. As a demo project, I'll show how I built a RAG for the Wheel of Time series.

Data Engineering
Planck-Bohr
13:20
13:20
5min
Coffee Break 5m
Auditorium
13:20
5min
Coffee Break 5m
Ernst-Curie
13:20
5min
Coffee Break 5m
Planck-Bohr
13:25
13:25
30min
Extending SQL Databases with Python
Florents Tselai

What if your database could run Python code inside SQL? In this talk, we’ll explore how to extend popular databases using Python, without needing to write a line of C.

We’ll cover three systems—SQLite, DuckDB, and PostgreSQL—and show how Python can be used in each to build custom SQL functions, accelerate data workflows, and prototype analytical logic. Each database offers a unique integration path:
- SQLite and DuckDB allow you to register Python functions directly into SQL via sqlite3.create_function, making it easy to inject business logic or custom transformations.
- PostgreSQL offers PL/Python, a full-featured procedural language for writing SQL functions in Python. We’ll also touch on advanced use cases, including embedding the Python interpreter directly into a PostgreSQL extension for deeper integration.

By the end of this talk, you’ll understand the capabilities, limitations, and gotchas of Python-powered extensions in each system—and how to choose the right tool depending on your use case, whether you’re analyzing data, building pipelines, or hacking on your own database.

Data Engineering
Planck-Bohr
13:25
30min
Football is complex, but your code doesn’t have to be — meet DataBallPy and a practical deep dive into pressing
Alexander Oonk, Tygo Nikamp

DataBallPy is an open-source Python package that quickly starts your analysis of a football-related question. In the current talk, we will introduce the core features and functionalities of DataBallPy using code examples with compelling visualisations. The second part of the talk will showcase a practical example of how the Royal Belgian Football Association (RBFA) has used components of DataBallPy to analyse the effectiveness and efficiency of pressuring the opponent in over 200 games. Taken together, this talk will give you a clear starting point of how to start answering your football-related questions.

Sports Analytics hosted PySport
Ernst-Curie
13:25
30min
From €1M License to In-House Success: How We Built a Real-Time Recommendation System and Saved Millions Doing It
ALI KOHAN

When we at Bol decided to personalize campaign banners, we did what many companies do: bought an expensive solution. As a software engineering team with zero data science experience, we integrated a third-party recommender system for €1 million annually, built the cloud infrastructure, and waited for results. After our first season, the data told a harsh truth—the third-party tool wasn't delivering value proportional to its cost. We faced a crossroads: accept mediocrity or build our own solution from scratch, tailored to our requirements and architecture.
We'll walk you through our journey of building a more intelligent and flexible recommendation system from the ground up, and how this journey saved us over a million euros per year. We will share the incremental steps that shaped our journey, alongside the valuable lessons learned along the way

AI/Machine Learning/GenAI
Auditorium
13:55
13:55
15min
Coffee break
Auditorium
13:55
15min
Coffee break
Ernst-Curie
13:55
15min
Coffee break
Planck-Bohr
14:10
14:10
30min
GPS doesn't work! Can a model alert us before this happens?
Vincenzo Ventriglia

Have you ever happened to use GPS and realised that it is not working properly? The Sun and a Space Weather effect called Travelling Ionospheric Disturbances (TIDs) could be responsible. We will present an explainable TIDs forecasting model, based on CatBoost and using several physical drivers to make forecasts.

AI/Machine Learning/GenAI
Auditorium
14:10
30min
Planning Hockey Careers With Python
Jaroslav Bezdek

How can data science help young athletes navigate their careers? In this talk, I’ll share my experience building a career path planner for aspiring ice hockey players. The project combines player performance data, career path patterns, and predictive modeling to suggest possible development paths and milestones. Along the way, I’ll discuss the challenges of messy sports data and communicating insights in a way that resonates with non-technical users like coaches, parents, and players.

Sports Analytics hosted PySport
Ernst-Curie
14:40
14:40
5min
Coffee break 5m
Auditorium
14:40
5min
Coffee break 5m
Ernst-Curie
14:40
5min
Coffee break 5m
Planck-Bohr
14:45
14:45
30min
Technical Talk from Dell
Planck-Bohr
14:45
30min
ML system design: a bridge between a model and the solution
Dmitry Levashov

Designing an ML model is one thing; designing an ML system that actually solves a business problem is another.

This talk explores how ML system design bridges the gap between a model and a real solution. Through practical examples, we’ll look at how communication with stakeholders, understanding functional and non-functional requirements, and aligning optimization and evaluation with business needs determine whether an ML initiative succeeds or stalls.

We’ll highlight key decision points — from translating vague goals into measurable objectives to balancing model performance with constraints like latency, interpretability, and maintainability.

Attendees will walk away with a sharper view of what makes an ML system truly fit for its environment — and why good design matters as much as good modeling.

AI/Machine Learning/GenAI
Auditorium
14:45
30min
Optimizing fantasy basketball decisions with Python: linear & integer programming for roster management
Pawel Kapuscinski

Fantasy basketball involves daily decisions: which players to start, who to pick up from free agency, and how to balance competing objectives across multiple statistical categories. This talk demonstrates how linear programming and integer programming can help solving those problems.

Using Python library PuLP we'll explore when to use linear programming versus integer programming, how to formulate constraints for roster decisions, and how to handle different league formats. Through practical examples, we'll build optimizers for start/sit decisions and free agency streaming.

Sports Analytics hosted PySport
Ernst-Curie
15:15
15:15
5min
Coffee Break 5m
Auditorium
15:15
5min
Coffee Break 5m
Ernst-Curie
15:15
5min
Coffee Break 5m
Planck-Bohr
15:20
15:20
30min
Technical Talk from ASML
Auditorium
15:20
30min
Technical Talk from CGI
Ernst-Curie
15:20
30min
Technical Talk from Multiverse
Planck-Bohr
15:50
15:50
5min
Coffee break 5m
Auditorium
15:50
5min
Coffee break 5m
Ernst-Curie
15:50
5min
Coffee break 5m
Planck-Bohr
15:55
15:55
30min
Closing Keynote
Unnamed user

Closing Keynote

Auditorium
16:25
16:25
15min
Closing

Closing

Auditorium
16:40
16:40
60min
Networking and Drinks
Auditorium