PyData London 2025

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.
08:00
08:00
60min
Registration & Breakfast
Grand Hall
08:00
60min
Registration & Breakfast
Doddington Forum
08:00
60min
Registration & Breakfast
Hardwick Hub
08:00
60min
Registration & Breakfast
Library
09:00
09:00
210min
GPU Accelerated Python
Jeremy Tanner, Katrina Riehl, Jacob Tomlinson, Lawrence Mitchell

Accelerating Python using the GPU is much easier than you might think. We will explore the powerful CUDA-enabled Python ecosystem in this tutorial through hands-on examples using some of the most popular accelerated scientific computing libraries.

Topics include:
- Introduction to General Purpose GPU Computing
- GPU vs CPU - Which processor is best for which tasks
- Introduction to CUDA
- How to use CUDA with Python
- Using Numba to write kernel functions
- CuPy
- cuDF

No prior experience with GPU's is necessary, but attendees should be familiar with Python.

Grand Hall
09:00
90min
Hands-on with Apache Iceberg
Anders Bogsnes

You've probably heard the name Apache Iceberg by now. If it wasn't when Databricks reportedly spent 2 billion USD buying Tabular, it might have been when AWS announced S3 Tables built on Iceberg. But do you know what Apache Iceberg actually is? Or how you could start using it today?

In this tutorial, we will walk through an end-to-end example of writing and reading Iceberg data, while taking a few pitstops to demonstrate Iceberg's selling points.

Hardwick Hub
09:00
90min
Introduction to Bayesian Time Series Analysis with PyMC
Chris Fonnesbeck

Time series data is ubiquitous, from stock market prices and weather patterns to disease outbreaks and sports outcomes. Accurately modeling these data and generating useful predictions requires specialized techniques due to the unique characteristics of time series data. This tutorial provides a practical introduction to Bayesian time series analysis using PyMC, a powerful probabilistic programming library in Python. Participants will learn how to build, evaluate, and interpret various Bayesian time series models, including ARIMA models, dynamic linear models, and stochastic volatility models. We'll emphasize practical application, covering data preprocessing, model selection, diagnostics, and forecasting, empowering attendees to tackle real-world time series problems with confidence.

Doddington Forum
10:30
10:30
30min
Break
Doddington Forum
10:30
30min
Break
Hardwick Hub
10:30
30min
Break
Library
11:00
11:00
90min
Forecasting Weather using Time Series ML
Suyash Joshi

This hands-on workshop covers how to use open source ML models like LSTMs and TimeSeries LLM's, with Python to try to forecast weather patterns, with best practices for data preparation and real time predictions.

Doddington Forum
11:00
90min
Package Your Python Code as a CLI
Jeroen Janssens, Thijs Nieuwdorp

Learn how to transform your Python code into a command-line tool. Jeroen Janssens, author of Data Science at the Command Line, guides you through the process of turning your scripts into reusable, executable tools, integrating them into your data workflows and harnessing the power of the Unix command line.

Hardwick Hub
12:30
12:30
60min
Lunch Break
Grand Hall
12:30
60min
Lunch Break
Doddington Forum
12:30
60min
Lunch Break
Hardwick Hub
12:30
60min
Lunch Break
Library
13:30
13:30
90min
How To Measure And Mitigate Unfair Bias in Machine Learning Models
John Sandall

In this 90-minute workshop, machine learning engineers and data scientists will learn practical techniques for identifying and mitigating age bias in AI-driven hiring systems. We’ll explore fairness metrics like statistical parity, counterfactual fairness, and equalized odds, and demonstrate how tools such as Fairlearn, Aequitas, and AI Fairness 360 can be used to monitor and improve model fairness. Through hands-on exercises, participants will walk away with the skills to evaluate and de-bias models in high-risk areas like recruitment.

Grand Hall
13:30
90min
Python Meets Quantum: Learn, Code, and Simulate
Andrea Melloncelli

This workshop is designed for Python developers eager to explore the exciting world of quantum computing. Through interactive exercises and practical coding examples, participants will learn how to program quantum computers using Python. No advanced background in quantum mechanics is required - just curiosity and a willingness to dive into cutting-edge technology.

Doddington Forum
13:30
90min
Transformers Inside Out (Parts 1 & 2)
Sam Joseph

Large Language Models like GPT4 are now a key part of the technology landsacpe, but how do they really work? And can you code them up at home? In this tutorial we'll create a simple GPT and train it on a simplified dataset of children's jokes. We'll work against a new set of transformer encoder flow diagrams that intuitively match the code, and look at visualisations of GPT's internal representations in order to better understand transformers inside out!

Hardwick Hub
15:00
15:00
30min
Break
Grand Hall
15:00
30min
Break
Doddington Forum
15:00
30min
Break
Hardwick Hub
15:00
30min
Break
Library
15:30
15:30
90min
Building your own vertical agent with AG2 AgentOS
Tim Santos, Chi Wang

In this tutorial, we will cover basic and advanced agentic design patterns in AG2 and we will go through practical implementations to demonstrate AI agents in action.

Grand Hall
15:30
90min
Graph Theory for Multi-Agent Integration: Showcase Clinical Use Cases
Ahmad Albarqawi

Graph theory is a well-known concept for algorithms and can be used to orchestrate the building of multi-model pipelines. By translating tasks and dependencies into a Directed Acyclic Graph, we can orchestrate diverse AI models, including NLP, vision, and recommendation capabilities. This tutorial provides a step-by-step approach to designing graph-based AI model pipelines, focusing on clinical use cases from the field.

Hardwick Hub
15:30
90min
Hands-on workshop on developing Reinforcement Learning solutions with financial domain example use cases.
Ade Idowu

Reinforcement Learning (RL) has emerged as a transformative sub-field in AI/ML, driving breakthroughs in areas ranging from autonomous robotics to personalized recommendation systems. This workshop is designed to serve a broad audience—from beginners eager to grasp foundational RL concepts to practitioners seeking to deepen their technical expertise through applied projects. These projects will range from developing simple classical RL game environments to practical financial domain use cases such as using RL sequential decision making for stock trading and asset portfolio optimization scenarios.

Doddington Forum
08:00
08:00
60min
Registration & Breakfast
Grand Hall
08:00
60min
Registration & Breakfast
Doddington Forum
08:00
60min
Registration & Breakfast
Hardwick Hub
08:00
60min
Registration & Breakfast
Library
09:00
09:00
55min
Opening Notes & Keynote: Keep Calm and Data On: Being a data science practitioner in the era of AI proliferation
Leanne Fitzpatrick

Since the end of 2022, the AI space has reached unprecedented velocity, scale and proliferation. When it seems like everyone (and their dog) is talking about AI, how should those of us who've been working in Machine Learning, Data Science (and AI) as domain experts look to navigate the conversation? In this talk, Leanne will aim to shine a light on the impact the AI arms race is having on our field, the reality of what it means to be a practitioner and some principles to stick by to help traverse what may appear to be a time of panic.

Grand Hall
09:55
09:55
25min
Break
Grand Hall
09:55
25min
Break
Doddington Forum
09:55
25min
Break
Hardwick Hub
09:55
25min
Break
Library
10:20
10:20
45min
Multi-Task Learning for Fraud detection: From Trees to MLPs
Callum Court

This talk will present Monzo's exploration of multi-task deep learning to enhance our real-time fraud detection systems. I will outline the challenges of card fraud detection, and explain the limitations of traditional gradient boosted decision tree models in terms of generalisation to rare fraud subtypes. This will motivate the use of multi-task learning, which leverages shared dense representations across fraud sub-tasks. By consolidating multiple specialist learners into a single model, we observe improved performance on less prevalent fraud types, leading to better generalisability, scalability, and robustness. I will also share results from testing multi-task models within our fraud detection infrastructure.

Grand Hall
10:20
45min
Parallel PyTorch Inference with Python Free-Threading
Michał Szołucha

This talk examines multi-threaded parallel inference on PyTorch models using the new No-GIL, free-threaded version of Python. Using a simple 124M parameter GPT2 model that we train from scratch, we explore the novel new territory unlocked by free-threaded Python: parallel PyTorch model inference, where multiple threads, unimpeded by the Python GIL, attempt to generate text from a transformer-based model in parallel.

Hardwick Hub
10:20
135min
PyMC Code Sprint
Chris Fonnesbeck

Join the PyMC development team for a fun and engaging hackathon!

Library
10:20
45min
Why you should stop pretending your sparse data is dense
Alex Owens

Lots of data in the real world has missing values, but historically prevalent data science tools have had limited support for such data. This talk will compare traditional numerical approaches, the more modern alternative Arrow, as well as ArcticDB, the client-side Dataframe database developed at Man Group.

Quant Finance Track Sponsored by Man Group
Doddington Forum
11:05
11:05
45min
AI agents testing: How to evaluate the unpredictable
Emeli Dral

AI agents and multi-step workflows are powerful, but testing them can be tricky. This talk explores practical ways to test these complex systems — like running multi-step simulations, checking tool calls, and using LLMs for evaluation. You'll also learn how to prioritize what to test and set up session-level evaluations with open-source tools.

Grand Hall
11:05
45min
How we unified feature engineering across data and backend at Monzo
Alex Jones

Deep dive into how Monzo reduced the effort it takes to generate point-in-time correct features for model development and productionise them with realtime streaming using our event-driven architecture.

Quant Finance Track Sponsored by Man Group
Doddington Forum
11:05
45min
Sovereign Data for AI with Python
lex avstreikh

The only certainty in life is that the pendulum will always swing. Recently, the pendulum has been swinging towards repatriation. However, the infrastructure needed to build and operate AI systems using Python in a sovereign (even air-gapped) environment has changed since the shift towards the cloud. This talk will introduce the infrastructure you need to build and deploy Python applications for AI - from data processing, to model training and LLM fine-tuning at scale to inference at scale. We will focus on open-source infrastructure including:
a Python library server (Pypi, Conda, etc) and avoiding supply chain attacks
a container registry that works at scale
a S3 storage layer
a database server with a vector index

Hardwick Hub
11:50
11:50
45min
Bringing stories to life with AI, data streaming and generative agents
Olena Kutsenko

Explore how AI-powered Generative Agents can evolve in real time using live data streams. Inspired by Stanford's 'Generative Agents' paper, this session dives into building dynamic, AI-driven worlds with Apache Kafka, Flink, and Iceberg - plus LLMs, RAG, and Python. Demos and practical examples included!

Grand Hall
11:50
45min
Cutting Edge Football Analytics using Polars, Keras and Spektral
Joris Bekkers

Football analytics has rapidly evolved over the past five years, becoming a crucial part of professional and fan discourse. While much of the cutting-edge research remains hidden behind the fences of club training grounds, a growing ecosystem of open-source tools now enables anyone to develop advanced football analytics models.

In this talk, I'll showcase key open-source libraries—Polars for high-performance data processing, Keras for deep learning, and Spektral for Graph Neural Networks (GNNs)—to analyze millions of player coordinates from publicly available high-frequency positional tracking data. I'll demonstrate how these tools can be used to build in-game prediction models and extract advanced football metrics that only the most advanced football clubs currently use.

Hardwick Hub
11:50
45min
Enhancing Fraud Detection with LLM-Generated Profiles: From Analyst Efficiency to Model Performance
Radion Bikmukhamedov

This talk explores how leveraging Large Language Models (LLMs) to generate structured customer profile summaries improved both compliance analyst workflows and fraud scoring models at a financial institution. Attendees will learn how embeddings derived from LLM-generated narratives outperformed traditional manual feature engineering and raw text embeddings, offering insights into practical applications of NLP in fraud detection.

Quant Finance Track Sponsored by Man Group
Doddington Forum
12:40
12:40
60min
Lunch Break
Grand Hall
12:40
60min
Lunch Break
Doddington Forum
12:40
60min
Lunch Break
Hardwick Hub
12:40
60min
Diversity Scholar Luncheon
Library
13:40
13:40
45min
Keynote- From Next Token Prediction to Reasoning and Beyond
Jay Alammar

Large Language Models (LLMs) have grown into prominence as some of the most popular technological artifacts of the day. This talk will provide a highly accessible and visual overview of LLM concepts relevant to today's data professionals. This includes looking at present-day Transformer architectures, tokenizers, reward models, reasoning LLMs, agentic trajectories, and the various training stages of a large language model including next-word prediction, instruction-tuning, preference-tuning, and reinforcement learning.

Grand Hall
14:25
14:25
20min
Break
Grand Hall
14:25
20min
Break
Doddington Forum
14:25
20min
Break
Hardwick Hub
14:25
20min
Break
Library
14:45
14:45
45min
Conquering PDFs: document understanding beyond plain text
Ines Montani

NLP and data science could be so easy if all of our data came as clean and plain text. But in practice, a lot of it is hidden away in PDFs, Word documents, scans and other formats that have been a nightmare to work with. In this talk, I'll present a new and modular approach for building robust document understanding systems, using state-of-the-art models and the awesome Python ecosystem. I'll show you how you can go from PDFs to structured data and even build fully custom information extraction pipelines for your specific use case.

Grand Hall
14:45
45min
PyScript - Python in the Browser
Chris Laffra

Learn how to write a web app in Python using PyScript, PyOdide, MicroPython, and WASM.

Hardwick Hub
14:45
45min
Tackling Data Challenges for Scaling Multi-Agent GenAI Apps with Python
Theo van Kraay

The use of multiple Large Language Models (LLMs) working together perform complex tasks, known as multi-agent systems, has gained significant traction. While orchestration frameworks like LangGraph and Semantic Kernel can streamline orchestration and coordination among agents, developing large-scale, production-grade systems can bring a host of data challenges. Issues such as supporting multi-tenancy, preserving transactional integrity and state, and managing reliable asynchronous function calls while scaling efficiently can be difficult to navigate.

Leveraging insights from practical experiences in the Azure Cosmos DB engineering team, this talk will guide you through key considerations and best practices for storing, managing, and leveraging data in multi-agent applications at any scale. You’ll learn how to understand core multi-agent concepts and architectures, manage statefulness and conversation histories, personalize agents through retrieval-augmented generation (RAG), and effectively integrate APIs and function calls.

Aimed at developers, architects, and data scientists at all skill levels, this session will show you how to take your multi-agent systems from the lab to full-scale production deployments, ready to solve real-world problems. We’ll also walk through code implementations that can be quickly and easily put into practice, all in Python.

Doddington Forum
15:30
15:30
45min
Feminist AI Lounge
Ines Montani

Join our chill space, unwind, chat about Feminist AI and contribute to the PyData London DIY collage zine.

Elizabeth Board Room
15:30
45min
Media Mix Modelling - how we can save company budget?
Natalia Ziemba Jankowska

How can engineers empower marketing teams in the post-cookie era? Discover Bayesian Media Mix Modelling (MMM), a robust data science approach to evaluate multi-channel marketing effectiveness. Learn how to implement MMM and take actionable insights back to your company.

Hardwick Hub
15:30
45min
Not Another LLM Talk… Practical Lessons from Building a Real-World Adverse Media Pipeline
Adam Hill

LLMs are magical—until they aren’t. Extracting adverse media entities might sound straightforward, but throw in hallucinations, inconsistent outputs, and skyrocketing API costs, and suddenly, that sleek prototype turns into a production nightmare.

Our adverse media pipeline monitors over 1 million articles a day, sifting through vast amounts of news to identify reports of crimes linked to financial bad actors, money laundering, and other risks. Thanks to GenAI and LLMs, we can tackle this problem in new ways—but deploying these models at scale comes with its own set of challenges: ensuring accuracy, controlling costs, and staying compliant in highly regulated industries.

In this talk, we’ll take you inside our journey to production, exploring the real-world challenges we faced through the lens of key personas: Cautious Claire, the compliance officer who doesn’t trust black-box AI; Magic Mike, the sales lead who thinks LLMs can do anything; Just-Fine-Tune Jenny, the PM convinced fine-tuning will solve everything; Reinventing Ryan, the engineer reinventing the wheel; and Paranoid Pete, the security lead fearing data leaks.

Expect practical insights, cautionary tales, and real-world lessons on making LLMs reliable, scalable, and production-ready. If you've ever wondered why your pipeline works perfectly in a Jupyter notebook but falls apart in production, this talk is for you.

Grand Hall
15:30
45min
Platforms for valuable AI Products: Iteration, iteration, iteration
John Carney

In data science experimentation is vital, the more we can experiment, the more we can learn.
However quick iteration isn't sufficient we also need to be able to easily promote these experiments to production to deliver value. This requires all the stability and reliability of any production system.
John will discuss building platforms that treat iteration as a first class consideration, the role of open source libraries, and balancing trade-offs.

Doddington Forum
15:30
45min
Python Engineering Excellence Birds of a Feather
Sam Joseph

A round table discussion on how to excel at Python engineering and architecting systems using Python, what kind of sessions and activities would best help support Python programmers be more effective at Python engineering, and how to achieve Python engineering excellence generally.

Library
16:15
16:15
45min
LLM Inference Arithmetics: the Theory behind Model Serving
Luca Baggi

Have you ever asked yourself how parameters for an LLM are counted, or wondered why Gemma 2B is actually closer to a 3B model? You have no clue about what a KV-Cache is? (And, before you ask: no, it's not a Redis fork.) Do you want to find out how much GPU VRAM you need to run your model smoothly?

If your answer to any of these questions was "yes", or you have another doubt about inference with LLMs - such as batching, or time-to-first-token - this talk is for you. Well, except for the Redis part.

Hardwick Hub
16:15
45min
NetworkX is Fast Now: Zero Code Change Acceleration
Mridul Seth

Have you ever wondered how to find connections in your data and to gain insights from them?
Come discover how NetworkX makes this easy (and fast!).

This talk is broadly divided into two parts. First we will talk about the power of graph analytics and how you can use tools like NetworkX to extract information from your data, and then we will talk about how we made the machinery behind NetworkX work with heterogeneous backends like GraphBLAS (CPU optimized) and cuGraph (GPU optimized).

Doddington Forum
16:15
45min
Successful Projects through a bit of Rebellion
Ian Ozsvald

This talk is for leaders who want new techniques to improve their success rates. In the last 15 months I've built a private data science peer mentorship group where we discuss rebellious ideas that improve our ability to make meaningful change in organisations of all sizes.

As a leader you've no doubt had trouble defining new projects (perhaps you've been asked - "add ChatGPT!"), getting buy-in, building support, defining defensible metrics and milestones, hiring, developing your team, dealing with conflict, avoiding overload and ultimately delivering valuable projects that are adopted by the business. I'll share advice across all of these areas based on 25 years of personal experience and the topics we've discussed in my leadership community.

You'll walk away with new ideas, perspectives and references that ought to change how to work with your team and organisation.

Grand Hall
17:00
17:00
60min
PyData London 2025 Happy Hour

Join us for drinks, snacks and networking from 5-6pm.

Grand Hall
08:00
08:00
60min
Registration & Breakfast
Grand Hall
08:00
60min
Registration & Breakfast
Doddington Forum
08:00
60min
Registration & Breakfast
Hardwick Hub
08:00
60min
Registration & Breakfast
Library
09:00
09:00
45min
Lightning Talks
Grand Hall
09:45
09:45
30min
Break
Grand Hall
09:45
30min
Break
Doddington Forum
09:45
30min
Break
Hardwick Hub
09:45
30min
Break
Library
10:15
10:15
45min
AI for Everyone - Building Inclusive Machine Learning Models
Elizabeth Osanyinro

Artificial Intelligence (AI) and Machine Learning (ML) are transforming industries such as healthcare, finance, education, and entertainment. However, these advancements are not benefiting everyone equally. Biases in datasets, algorithms, and design processes often lead to AI systems that unintentionally exclude or misrepresent underrepresented communities, reinforcing societal inequalities.

This talk, "AI for Everyone: Building Inclusive Machine Learning Models," explores the critical importance of developing AI systems that are ethical, fair, and accessible to all. We will examine real-world examples of AI bias, discuss techniques for identifying and mitigating bias in data and models, and explore frameworks for responsible AI development. Attendees will leave with actionable insights to design AI solutions that promote fairness, inclusivity, and social impact.

Grand Hall
10:15
45min
Automating Porosity Detection in Additive Manufacturing with Deep Learning
Onyekachukwu Ojumah

Additive Manufacturing (AM) enables complex, high-performance components, but porosity defects can compromise structural integrity. Traditional porosity analysis in X-ray CT scans is manual, slow, and inconsistent. This talk introduces a deep learning-based approach using CNNs and segmentation models to automate porosity detection, enhancing accuracy and efficiency. Attendees will gain insights into pre-processing 3D CT scans, training AI models, and solving industry challenges.

Hardwick Hub
10:15
45min
From Trees to Transformers: Our Journey Towards Deep Learning for Ranking
Theodore Meynard, Mihail Douhaniaris

GetYourGuide, a global marketplace for travel experiences, reached diminishing returns with its XGBoost-based ranking system. We switched to a Deep Learning pipeline in just nine months, maintaining high throughput and low latency. We iterated on over 50 offline models and conducted more than 10 live A/B tests, ultimately deploying a PyTorch transformer that yielded significant gains. In this talk, we will share our phased approach—from a simple baseline to a high-impact launch—and discuss the key operational and modeling challenges we faced. Learn how to transition from tree-based methods to neural networks and unlock new possibilities for real-time ranking.

Doddington Forum
11:00
11:00
45min
Making LLMs reliable: A practical framework for production
Lena Shakurova

LLM outputs are non-deterministic, making it difficult to ensure reliability in production, especially in high-risk applications. In this talk, we’ll walk through a structured approach to making LLMs production-ready. We’ll cover setting up tests during experimentation, implementing real-time guardrails before responses reach users, and monitoring live performance for critical issues. Finally, we’ll discuss post-deployment log analysis to drive continuous improvements and build trust with stakeholders.

Doddington Forum
11:00
45min
One repo to rule them all, one repo to bind them...Control all of your projects with copier!
Tim Paine

Did you know you can control all of your projects from a central template repository? In this talk we'll learn about copier, a framework for creating project templates. A natural successor to cookiecutter and GitHub templates, copier lets your projects re-sync from the original template, with new or the same arguments. Adopt the latest and greatest tools without leaving any of your libraries behind!

Hardwick Hub
11:00
45min
Reproducibility in Embedding Benchmarks
Isaac Chung

Reproducibility in embedding benchmarks is no small feat. Prompt variability, growing computational demands, and evolving tasks make fair comparisons a challenge. The need for robust benchmarking has never been greater. In this talk, we’ll explore the quirks and complexities of benchmarking embedding models, such as prompt sensitivity, scaling issues, and emergent behaviors.

We’ll hear straight from the Massive Text Embedding Benchmark (MTEB) maintainers and show how MTEB (and its extensions like MMTEB and MIEB) simplifies reproducibility, making it easier for researchers and industry practitioners to measure progress, choose the right models, and push the boundaries of embedding performance.

Grand Hall
11:45
11:45
45min
Analysing smart meter data to uncover energy consumption patterns
Sofia Pinto

Smart meters have the potential to not only provide information to individual householders about their energy consumption, but to identify patterns of usage across the entire energy system. At Nesta, we have been analysing smart meter data to uncover information about energy consumption habits, and how household appliances, physical property characteristics and demographic factors influence energy usage - as this can help develop energy-saving initiatives.
In this talk we will present the data science techniques we used, such as clustering, present our results as well as discuss how we translate them to a non-data science audience, and share learnings of conducting data science work in a secure data lab to allow for analysis of sensitive and confidential data.

Doddington Forum
11:45
45min
CUDA in Python: A New Era for GPU Acceleration
Andy Terrel

We discuss bringing Python natively to the CUDA ecosystem. From low level bindings to domain specific applications, CUDA is supporting Python standards and ecosystem. New libraries include nvmath-python for managing optimized mathematics libraries, cccl-python for cooperative threading and device parallelism, cuda-core for managing the complete CUDA toolstack from Python with no need for C++, and finally numba-cuda for generating device side kernels with integration of C++ device libraries and LTO IR.

Grand Hall
11:45
45min
Git Commit, MedTech Transformed: Python’s Medical Robotics Breakthrough
Lilinoe Harbottle

Code changing lives? Absolutely. We're diving into Python's power to deploy cutting-edge solutions for lung cancer diagnosis and treatment in medical and surgical robotics. Expect demos showcasing algorithms, data analysis, and real-world impact—bridging MedTech innovation and life-changing solutions. Ready to see Python revolutionize lung health? Join us. Let's code a healthier future together!

Hardwick Hub
11:45
45min
Leaders at PyData
Ian Ozsvald

A self-organised workshop for data leaders to discuss the opportunity and challenges they face with their peers. This is the 9th iteration at a PyData conference. Questions are raised and answered by attendees, it is facilitated by Ian Ozsvald (PyDataLondon co-founder). You are encouraged to carry on talking to fellow leaders after this session, Ian will give out badges to help with this.

The format is based on the Breakout discussions that Ian uses in his private RebelAI leadership group, you're welcome and encouraged to copy and use it in your own organisations. Typical attendance is 60+ leaders.

The 2022 session using a different format ("Executives at PyData" as it was known) was written up, you can see it here: https://numfocus.medium.com/executives-at-pydata-global-2022-193cbc2d3f3b

Library
12:30
12:30
60min
Lunch
Grand Hall
12:30
60min
Lunch
Doddington Forum
12:30
60min
Lunch
Hardwick Hub
12:30
60min
PyData Organizers Lunch
Library
13:30
13:30
45min
Keynote- Innovation is Dead
Tony Mears

Join us for an exciting Keynote with Tony Mears!

Grand Hall
14:15
14:15
30min
Break
Grand Hall
14:15
30min
Break
Doddington Forum
14:15
30min
Break
Hardwick Hub
14:15
30min
Break
Library
14:45
14:45
45min
Agentic Cyber Defense with External Threat Intelligence
Jyoti Yadav

This talk will detail how to integrate external threat intelligence data into an autonomous agentic AI system for proactive cybersecurity. Using real world datasets—including open-source threat feeds, security logs, or OSINT—you will learn how to build a data ingestion pipeline, train models with Python, and deploy agents that autonomously detect and mitigate cyber threats. This case study will provide practical insights into data preprocessing, feature engineering, and the challenges of adversarial conditions.

Doddington Forum
14:45
45min
Debugging Leadership: Six Errors when Moving From Code to Management
Matthew Upson

Transitioning from a hands-on Pythonista to a leadership role is a journey filled with challenges, and like debugging code, it requires identifying, isolating, and fixing problems. In this talk, I’ll share eight key lessons from my journey from Data Scientist to Co-Founder of a small software company, framed as Python errors.

From battling imposter syndrome (ValueError: self-worth not defined), to learning to delegate (DeadlockError: unable to release control), and avoiding burnout (RuntimeError: system overload), this talk offers actionable advice for anyone navigating the leap from technical contributor to technical leader.

Expect a mix of humour, relatable stories, and hard-won lessons as we explore how debugging leadership challenges is just as rewarding (and occasionally frustrating) as debugging code. Whether you’re considering a leadership role or already on the journey, this session will leave you with practical insights to navigate common pitfalls and approach a leadership transition with a clearer understanding of what to expect.

Hardwick Hub
14:45
45min
Diving into Transformer Model Internals
Matt Squire

While everybody and their dog is building applications on generative AI, the inner workings of transformers - the model architecture behind genAI age - is a mystery for most people. In this talk, I'll walk through how transformers are implemented, using real-life Python code from the HuggingFace transformers library.

Grand Hall
14:45
90min
Humble Data Workshop
Hugh Evans

Learn Python for Data Science in this Beginners’ Day Workshop Would you like to learn to code but don’t know where to start? Taking your first steps in programming can seem like an impossible task so we’ve decided to put on a workshop to show beginners how it can be done and share our passion for the world of data science!

Apply to be a student https://forms.gle/2cvNyRK8c8pNnpnz5

Library
15:30
15:30
45min
Building a knowledge graph for climate policy
Harrison Pim, Fred O'Loughlin

At Climate Policy Radar, we're building an open-source knowledge graph for climate policy. In this talk, we'll share how we combine in-house expertise with scalable data infrastructure to identify key concepts in thousands of global climate policy documents. We'll also touch on ontology design, equitable evaluation, and the climate impacts of AI.

Hardwick Hub
15:30
45min
Is coding assistant as good as we thought in coding?
Cheuk Ting Ho

Nowadays coding assistants are everywhere, many IDEs are offering them as plugins, and are becoming more and more powerful. But it prompts us questions, is coding assistant as good as we want it to be? What can and can't these AI agents do? Will AI take my job?

Doddington Forum
15:30
45min
You Came to a Python Conference. Now, Go Do a PR Review!
Samiul Huque

If you or your organization are spending time and resources attending a Python conference, you will want to ensure your team gets something immediately actionable and helpful out of it. As coders, we often think about writing code as the only way to contribute. However, pull request reviews are an often overlooked, but highly actionable way to have an impact.

Giving good PR reviews is an art, with two equally important parts: the technical side and the communication side. While the technical side ensures the quality, maintainability, and efficiency of the Python code, the communication around the PR determines whether the feedback can be understood and acted upon. However, we have all seen code reviews that have been ignored or executed poorly due to poor communication.

This talk addresses both facets of PR reviews by introducing the archetypes of bad code reviewers:
1) The “Looks Good to Me” Reviewer: This peer reviewer provides little to no actionable feedback.
2) The “Technical Nitpicker”: This peer reviewer focuses on small Python-specific issues, but fails to
communicate constructively.
3) The “Nit” Commenter: This peer reviewer prefaces every comment with “nit,” while offering unclear, yet technically valid suggestions

Using these archetypes, we will explore Python-specific technical topics (such as pass by reference vs. pass by value), while delving into how to communicate and deliver feedback in a clear and actionable manner. Using real-world examples, attendees will learn how to:
a) Identify and address technical issues in Python PRs
b) Communicate feedback effectively
c) Balance technical rigor with constructive feedback
d) Communicate their peer review comments clearly

Grand Hall
16:15
16:15
45min
Polars, DuckDB, PySpark, PyArrow, pandas, cuDF: how Narwhals has brought them all together!
Marco Gorelli

Suppose you want to write a data science tool to do feature engineering. Your experience may go like this:
- Expectation: you can focus on state-of-the art techniques for feature engineering.
- Reality: you keep having to make you codebase more complex because a new dataframe library has come out and users are demanding support for it.

Or rather, it might have gone like that in the pre-Narwhals era. Because now, you can focus on solving the problems which your tool set out to do, and let Narwhals handle the subtle differences between different kinds of dataframe inputs!

Doddington Forum
16:15
45min
Scaling AI workloads with Ray & Airflow
Tatiana Al-Chueyr

Ray is an open-source framework for scaling Python applications, particularly machine learning and AI workloads. It provides the layer for parallel processing and distributed computing. Many large language models (LLMs), including OpenAI's GPT models, are trained using Ray.

On the other hand, Apache Airflow is a consolidated data orchestration framework downloaded more than 20 million times monthly.

This talk presents the Airflow Ray provider package that allows users to interact with Ray from an Airflow workflow. In this talk, I'll show how to use the package to create Ray clusters and how Airflow can trigger Ray pipelines in those clusters.

Grand Hall
16:15
45min
Transfer Learning: Leveraging Pretrained Models with Limited Data
Salman Khan

Transfer learning has revolutionised machine learning by enabling models trained on large datasets to generalise effectively to tasks with limited data. This talk explores strategies for adapting pretrained models to new domains, focusing on audio processing as a case study. Using YAMNet, Whisper, and wav2vec2 for laughter detection, we demonstrate how to extract meaningful representations, fine-tune models efficiently, and handle severe class imbalances. The session covers feature extraction, model fusion techniques, and best practices for optimising performance in data-scarce environments. Attendees will gain practical insights into applying transfer learning across various modalities beyond audio, maximising model effectiveness when labelled data is scarce.

Hardwick Hub