<?xml version='1.0' encoding='utf-8' ?>
<iCalendar xmlns:pentabarf='http://pentabarf.org' xmlns:xCal='urn:ietf:params:xml:ns:xcal'>
    <vcalendar>
        <version>2.0</version>
        <prodid>-//Pentabarf//Schedule//EN</prodid>
        <x-wr-caldesc></x-wr-caldesc>
        <x-wr-calname></x-wr-calname>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>HFWMHG@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-HFWMHG</pentabarf:event-slug>
            <pentabarf:title>GPU Accelerated Python</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250606T090000</dtstart>
            <dtend>20250606T123000</dtend>
            <duration>3.03000</duration>
            <summary>GPU Accelerated Python</summary>
            <description></description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://cfp.pydata.org/london2025/talk/HFWMHG/</url>
            <location>Grand Hall</location>
            
            <attendee>Jacob Tomlinson</attendee>
            
            <attendee>Katrina Riehl</attendee>
            
            <attendee>Jeremy Tanner</attendee>
            
            <attendee>Lawrence Mitchell</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>UTCBUH@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-UTCBUH</pentabarf:event-slug>
            <pentabarf:title>How To Measure And Mitigate Unfair Bias in Machine Learning Models</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250606T133000</dtstart>
            <dtend>20250606T150000</dtend>
            <duration>1.03000</duration>
            <summary>How To Measure And Mitigate Unfair Bias in Machine Learning Models</summary>
            <description>AI tools used in hiring can unintentionally perpetuate discrimination in protected characteristics such as age, gender and ethnicity, leading to significant real-world harm. This workshop provides a practical, hands-on approach to addressing biases in machine learning models, using the example of AI-powered hiring tools. You’ll train a neural network on biased datasets, evaluate fairness metrics, and work with state-of-the-art tools like [Fairlearn](https://fairlearn.org/) and [Google’s What-If Tool](https://pair-code.github.io/what-if-tool/) to measure and mitigate bias. By the end of the session, participants will be equipped with the knowledge and tools to tackle bias in their own projects and ensure fairer AI systems.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://cfp.pydata.org/london2025/talk/UTCBUH/</url>
            <location>Grand Hall</location>
            
            <attendee>John Sandall</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>J83ZYE@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-J83ZYE</pentabarf:event-slug>
            <pentabarf:title>Building your own vertical agent with AG2 AgentOS</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250606T153000</dtstart>
            <dtend>20250606T170000</dtend>
            <duration>1.03000</duration>
            <summary>Building your own vertical agent with AG2 AgentOS</summary>
            <description>Majority of knowledge work nowadays requires comprehensive, integrated research in order to uncover deep insights. While existing technologies have advanced, the data deluge and fragmented complex systems mean extensive resources and specialised teams are still necessary. AG2 AgentOS changes this paradigm by seamlessly enabling multi-agent systems to solve complex tasks and aggregate diverse data sources to achieve outcomes that would usually take even experts a lot of time.

In this session, we will cover:
1. Design patterns and practical implementations to demonstrate AI agents in action such as
- Customized GroupChat
- Code execution
- Deep Research Agent
- Swarm
- Tool using
- Async chats
- Dynamic instructions
- Realtime Agent
- GraphRAG
- Structured Output
2. The anatomy of a Vertical AI agent application and how to seamlessly integrate multiple agents powered by models from OpenAI, Anthropic, Gemini, and open-weight providers, and a diverse range of tools to build your own vertical agent. 
3. We will utilise components we’ve learned to collect information from the internet, connect to a data room, and create various modelling functions to replicate analysis done in a technical and commercial deep dive in a startup.
4. Explain how to contribute to the thriving AI agent ecosystem.

Target industries and use cases: 
1. Industries that require deep research, such as finance, healthcare, science &amp; engineering. Research/Analysis/Science use cases: Deep Research Agent, SciAgents, Financial Analysis, AutoML Agent.
2. Industries involving customer support, such as e-commerce, education, social media. Customer-oriented use cases: Travel Planner, Order Management, Realtime ToDo Assistant, Email Management, -Social Media Management, Youth Helper.
3. Industries involving heavy software design &amp; development, such as gaming, web, data engineering. Software-oriented use cases: Game Design Agents, Web Agent, Software Testing Agent

At the end of the tutorial, the attendees would gain a better understanding of agent-oriented programming concepts and how to reach production-readiness 10x faster. Through the examples given, they will be able to construct effective multi-agent systems to solve complex tasks. They will have reusable building blocks to customize for their own vertical agent.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://cfp.pydata.org/london2025/talk/J83ZYE/</url>
            <location>Grand Hall</location>
            
            <attendee>Tim Santos</attendee>
            
            <attendee>Chi Wang</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>T9KEHN@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-T9KEHN</pentabarf:event-slug>
            <pentabarf:title>Introduction to Bayesian Time Series Analysis with PyMC</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250606T090000</dtstart>
            <dtend>20250606T103000</dtend>
            <duration>1.03000</duration>
            <summary>Introduction to Bayesian Time Series Analysis with PyMC</summary>
            <description>Traditional time series methods often struggle with complex patterns, uncertainty quantification, and incorporating prior knowledge. Bayesian methods offer a robust alternative, providing a flexible framework for handling these challenges. This tutorial will equip participants with the skills to leverage the power of Bayesian time series analysis using PyMC.

This tutorial is designed for data scientists, analysts, and researchers with some familiarity with Python and basic statistical concepts. Prior experience with time series analysis is helpful but not strictly required.  A basic understanding of probability distributions and Bayesian inference will be beneficial, but we will review key concepts.  Participants should be comfortable working with Jupyter notebooks.

By the end of this tutorial, participants will be able to:

- Understand the advantages of Bayesian time series analysis.
- Implement various Bayesian time series models using PyMC.
- Preprocess time series data for Bayesian modeling.
- Perform model selection and comparison.
- Evaluate model fit and diagnose potential issues.
- Generate forecasts and interpret results.
- Apply Bayesian time series methods to real-world datasets.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://cfp.pydata.org/london2025/talk/T9KEHN/</url>
            <location>Doddington Forum</location>
            
            <attendee>Chris Fonnesbeck</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>U7VZKA@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-U7VZKA</pentabarf:event-slug>
            <pentabarf:title>Forecasting Weather using Time Series ML</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250606T110000</dtstart>
            <dtend>20250606T123000</dtend>
            <duration>1.03000</duration>
            <summary>Forecasting Weather using Time Series ML</summary>
            <description>Weather patterns are notoriously challenging to predict, typically requiring sophisticated satellite technology and advanced modeling techniques. However, recent advancements in deep learning for time series forecasting offer powerful new methods to tackle this complexity.

In this hands-on workshop, you will learn to try to forecast weather conditions for the next six months using Python, Google Colab, InfluxDB and popular libraries like Neural Prophet and state of the art Time Series LLMs. Learn the strengths, weaknesses, and common pitfalls of each approach, from classical techniques (ARIMA) to using Transformers. We’ll explore data preprocessing, model training, evaluation, with practical examples and ready-to-use notebooks. All code and instructions will be available on GitHub, ensuring you can continue exploring time series forecasting beyond the session.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://cfp.pydata.org/london2025/talk/U7VZKA/</url>
            <location>Doddington Forum</location>
            
            <attendee>Suyash Joshi</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>Z3UW79@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-Z3UW79</pentabarf:event-slug>
            <pentabarf:title>Python Meets Quantum: Learn, Code, and Simulate</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250606T133000</dtstart>
            <dtend>20250606T150000</dtend>
            <duration>1.03000</duration>
            <summary>Python Meets Quantum: Learn, Code, and Simulate</summary>
            <description>Workshop Highlights

1. Quantum Computing Fundamentals

    - A beginner-friendly introduction to key quantum concepts: qubits, superposition, entanglement, and quantum gates.
    - Understand why quantum computing is groundbreaking and how it differs from classical computing.

2. Getting Started with Quantum Programming in Python

    - Hands-on setup: installing and configuring Qiskit and other essential libraries.
    - Build and execute your first quantum circuits.

3. Developing Quantum Programs

    - Create and simulate quantum circuits for fundamental algorithms like the Quantum Fourier Transform and Grover’s search.
    - Learn how to test quantum programs on simulators before running them on real quantum hardware.

Who Should Attend?

This workshop is ideal for Python developers, data scientists, and ML practitioners curious about quantum computing. Basic Python knowledge is recommended - no prior experience in quantum physics is needed.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://cfp.pydata.org/london2025/talk/Z3UW79/</url>
            <location>Doddington Forum</location>
            
            <attendee>Andrea Melloncelli</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>W7WYMM@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-W7WYMM</pentabarf:event-slug>
            <pentabarf:title>Hands-on workshop on developing Reinforcement Learning solutions with financial domain example use cases.</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250606T153000</dtstart>
            <dtend>20250606T170000</dtend>
            <duration>1.03000</duration>
            <summary>Hands-on workshop on developing Reinforcement Learning solutions with financial domain example use cases.</summary>
            <description>Over the course of this interactive session, participants will embark on a journey that begins with an introduction to the fundamental principles of RL, including Markov Decision Processes, reward structures, and the critical balance between exploration and exploitation. We will then transition into a series of hands-on coding exercises using popular frameworks such as Python’s Gymnasium (formally referred to as Gym), PyTorch and RL open-source libraries such as Stable-baselines3 and Machin (to name a few). These exercises will enable attendees to implement classic algorithms like Q-learning, SARSA and deep learning algorithms such as actor-critic architectures and  policy gradients in controlled environments.

Real-world case studies and example use cases—ranging from classical simple simulated game environments to realistic decision-making systems in finance (such as stock trading and asset portfolio optimization use cases) - will illustrate how RL methodologies are applied in practice. During this workshop participants will develop and fine-tune RL models, gaining insights into performance evaluation, model tuning, and deployment strategies. Additionally, advanced topics such as deep RL architectures, on-policy and off-policy RL algorithms will be discussed and hacked interactively. 

This workshop aims not only to impart theoretical knowledge but also to empower participants with the practical skills needed to design and deploy effective RL solutions. Join us to explore the dynamic world of reinforcement learning and to enhance your toolkit for solving complex, data-driven challenges. All the python libraries/packages, reference papers and data used in this workshop will be open sourced and made available in a Github repo (which will be made available soon).</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://cfp.pydata.org/london2025/talk/W7WYMM/</url>
            <location>Doddington Forum</location>
            
            <attendee>Ade Idowu</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>V3CWEM@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-V3CWEM</pentabarf:event-slug>
            <pentabarf:title>Hands-on with Apache Iceberg</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250606T090000</dtstart>
            <dtend>20250606T103000</dtend>
            <duration>1.03000</duration>
            <summary>Hands-on with Apache Iceberg</summary>
            <description>**This tutorial is aimed at the data engineer who&#x27;s somewhat familiar with cloud storage solutions such as S3, Azure Blob Storage or Google Cloud Storage. The tutorial will consist of fully-local components running in Docker and Jupyter notebooks. You will be able to replicate the environment locally and play around with it yourself. **

Please clone https://github.com/andersbogsnes/pydata-london-2025-hands-on-apache-iceberg and run the commands in the README.md before the workshop if possible!

The goal of this tutorial is to give you an understanding of what Apache Iceberg is and does. 

We will write data in Iceberg format to an object store, taking the opportunity to demonstrate each of Iceberg&#x27;s selling points. Finally, we will query the data using a variety of query engines to demonstrate the promises of Iceberg&#x27;s interoperability.

## Outline
- Introduce some of the concepts needed to understand the why of Apache Iceberg
  - A brief history of table formats
  - A discussion of the importance of file formats
- Introducing the dataset we will be working with
- Writing data into Iceberg format - what is happening under the hood?
- Demonstrating the main selling points of Iceberg and why you should care
  - Schema Evolution
  - Hidden Partitioning
  - Time Travel
  - Data Compaction
- Querying the data
  - Duckdb
  - Polars
  - Other query engines</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://cfp.pydata.org/london2025/talk/V3CWEM/</url>
            <location>Hardwick Hub</location>
            
            <attendee>Anders Bogsnes</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>BCKCMR@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-BCKCMR</pentabarf:event-slug>
            <pentabarf:title>Package Your Python Code as a CLI</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250606T110000</dtstart>
            <dtend>20250606T123000</dtend>
            <duration>1.03000</duration>
            <summary>Package Your Python Code as a CLI</summary>
            <description>If you&#x27;re not sure whether this tutorial is for you, we recommend you watch Jeroen&#x27;s talk [Embrace the Unix Command Line and Supercharge Your PyData Workflow](https://www.youtube.com/watch?v=siPGvvrfylQ).


***Note: This tutorial assumes that you&#x27;re using macOS or a Linux distribution. If you&#x27;re using Windows, please [install WSL](https://learn.microsoft.com/en-us/windows/wsl/install) or [a suitable Docker image](https://jeroenjanssens.com/dsatcl/chapter-2-getting-started#docker-image).***

As your Python scripts evolve, turning them into command-line tools offers numerous benefits: reusability, testability, and greater efficiency. The Unix command line is a powerful environment, designed for combining tools, parallel execution, and working with massive data.

This hands-on tutorial will cover:

- The Unix philosophy and its relevance to data science
- How to convert Python code into a command-line tool
    - Preparing your code for reuse
    - Parsing command-line arguments
    - Reading from standard input
    - Making your tool executable and adding help options
- Best practices for designing command-line interfaces
- Upgrading from argv to argparse or Typer
- Self-contained tools with uv

Throughout the tutorial, we’ll develop an actual command-line tool, starting with Python’s standard library and later incorporating additional libraries. This tutorial is ideal for developers and researchers looking to enhance their workflows. No prior Unix knowledge is needed; essential concepts will be covered.

## Resources

- [Presentation](https://docs.google.com/presentation/d/14yhoWSaUf8RzKWQHQ426WAXbRXGAmSsMKyiFQrEACFo/edit?usp=sharing)
- [Code](https://github.com/jeroenjanssens/python-cli-tutorial)</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://cfp.pydata.org/london2025/talk/BCKCMR/</url>
            <location>Hardwick Hub</location>
            
            <attendee>Jeroen Janssens</attendee>
            
            <attendee>Thijs Nieuwdorp</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>R3UJN7@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-R3UJN7</pentabarf:event-slug>
            <pentabarf:title>Transformers Inside Out (Parts 1 &amp; 2)</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250606T133000</dtstart>
            <dtend>20250606T150000</dtend>
            <duration>1.03000</duration>
            <summary>Transformers Inside Out (Parts 1 &amp; 2)</summary>
            <description>In this tutorial we’ll work step by step through creating a simple GPT model in PyTorch.  We&#x27;ll use simplified kids jokes to train it and see how it’s internal representations evolve as it tries to tell (hopefully) funnier and funnier jokes.  Intermediate Python programming skills are assumed for this tutorial, as well as a basic understanding of matrix algebra.  No familiarity with PyTorch, GPT or LLMs is assumed.

Please clone https://github.com/karpathy/nanoGPT onto your laptop and follow the README.md instructions to get the dependencies installed `pip install torch numpy transformers datasets tiktoken wandb tqdm` before coming to the session,</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://cfp.pydata.org/london2025/talk/R3UJN7/</url>
            <location>Hardwick Hub</location>
            
            <attendee>Sam Joseph</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>PRDCGC@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-PRDCGC</pentabarf:event-slug>
            <pentabarf:title>Graph Theory for Multi-Agent Integration: Showcase Clinical Use Cases</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250606T153000</dtstart>
            <dtend>20250606T170000</dtend>
            <duration>1.03000</duration>
            <summary>Graph Theory for Multi-Agent Integration: Showcase Clinical Use Cases</summary>
            <description>I will start by providing an introduction to orchestrating multiple models in a single workflow and explaining why conventional linear pipelines fail to meet complex tasks. Next, we’ll outline how graph theory addresses clinical tasks such as patient document workflow, starting from doctor notes, blood results analysis, and discharge letters. Finally, we will discuss how to scale the concept of multi-model integration in any field. 
The tutorial will include live code demos, I will provide a GitHub repository with the tutorial code.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://cfp.pydata.org/london2025/talk/PRDCGC/</url>
            <location>Hardwick Hub</location>
            
            <attendee>Ahmad Albarqawi</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>ZLTHE9@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-ZLTHE9</pentabarf:event-slug>
            <pentabarf:title>Opening Notes &amp; Keynote: Keep Calm and Data On: Being a data science practitioner in the era of AI proliferation</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T090000</dtstart>
            <dtend>20250607T095500</dtend>
            <duration>0.05500</duration>
            <summary>Opening Notes &amp; Keynote: Keep Calm and Data On: Being a data science practitioner in the era of AI proliferation</summary>
            <description></description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/ZLTHE9/</url>
            <location>Grand Hall</location>
            
            <attendee>Leanne Fitzpatrick</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>EJWBPU@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-EJWBPU</pentabarf:event-slug>
            <pentabarf:title>Multi-Task Learning for Fraud detection: From Trees to MLPs</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T102000</dtstart>
            <dtend>20250607T110500</dtend>
            <duration>0.04500</duration>
            <summary>Multi-Task Learning for Fraud detection: From Trees to MLPs</summary>
            <description>Fraud detection is a complex problem due to the constant evolution of fraudulent behaviour, significant data imbalance, and the requirement for real-time decision-making. Accurate detection of fraud and financial crime is crucial for protecting customers and maintaining trust in the banking system. Traditional fraud detection often relies on binary classification models using tree-based algorithms. While these models offer good predictive performance and scalability, they can struggle to capture shared information across different types of fraud. This often results in the need for multiple specialist models, each requiring individual maintenance and retraining.
Multi-task learning, a deep learning approach, offers a potential solution by exploiting the commonalities between related fraud problems to improve overall prediction accuracy. Multi-task learning is particularly relevant where multiple prediction targets share underlying patterns. In fraud, different sub-types (e.g., identity theft, account takeover, coercion) frequently exhibit overlapping characteristics. A model trained on multiple signals simultaneously may be better at identifying subtle patterns that individual models might miss. Our hypothesis is that this should lead to increased generalisation, allowing multi-task models to adapt more effectively to new fraud patterns and reduce maintenance overhead.
In this talk, I will detail how we have tested this hypothesis at Monzo by applying multi-task learning to the problem of unauthorized card fraud. I will discuss the models we developed and the results we have observed in controlled offline settings..</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/EJWBPU/</url>
            <location>Grand Hall</location>
            
            <attendee>Callum Court</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>Q37AUM@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-Q37AUM</pentabarf:event-slug>
            <pentabarf:title>AI agents testing: How to evaluate the unpredictable</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T110500</dtstart>
            <dtend>20250607T115000</dtend>
            <duration>0.04500</duration>
            <summary>AI agents testing: How to evaluate the unpredictable</summary>
            <description>AI agents and multi-step AI workflows are incredibly powerful — but they can also be risky to deploy and even scarier to change. You don’t want your users to be the ones finding the bugs, but it&#x27;s often not clear how to test such complex systems in advance. Traditional unit tests and ML evaluation methods don’t really work when interactions unfold unpredictably across an entire session.

In this talk, we’ll break down practical ways to test compound AI systems, including chatbots and AI agents. We&#x27;ll cover:
- Strategies for testing complex systems
- Specific approaches, from testing the correctness of tool calls to running multi-step simulations.
- How to automate evaluation using both LLM-as-a-judge and deterministic checks.
- How to prioritize testing, balancing edge cases, adversarial scenarios, and core user experiences.

We&#x27;ll also share how you can configure and run session-level evaluation using open-source tools.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/Q37AUM/</url>
            <location>Grand Hall</location>
            
            <attendee>Emeli Dral</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>ZSLNXD@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-ZSLNXD</pentabarf:event-slug>
            <pentabarf:title>Bringing stories to life with AI, data streaming and generative agents</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T115000</dtstart>
            <dtend>20250607T123500</dtend>
            <duration>0.04500</duration>
            <summary>Bringing stories to life with AI, data streaming and generative agents</summary>
            <description>Storytelling has always been a way to connect and imagine new worlds. Now, with Generative Agents - AI-powered characters that can think, act, and adapt - we can take storytelling to a whole new level. But what if these agents could change and grow in real time, driven by live data streams?

Inspired by the Standford&#x27;s paper &quot;Generative Agents: Interactive Simulacra of Human Behavior&quot;,  this session explores how to build dynamic, AI-driven worlds using Apache Kafka, Apache Flink, and Apache Iceberg. We&#x27;ll use a Large Language Model to power  for conversation and agent decision-making,  integrate Retrieval-Augmented Generation (RAG) for memory storage and retrieval, and use Python to tie it all together. Along the way, we’ll examine different approaches for data processing, storage, and analysis.

By the end, you’ll see how data streaming and AI can work together to create lively, evolving virtual communities. Whether you’re into gaming, simulations, research or just exploring what’s possible, this session will give you ideas for building something amazing.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/ZSLNXD/</url>
            <location>Grand Hall</location>
            
            <attendee>Olena Kutsenko</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>DSAQW9@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-DSAQW9</pentabarf:event-slug>
            <pentabarf:title>Keynote- From Next Token Prediction to Reasoning and Beyond</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T134000</dtstart>
            <dtend>20250607T142500</dtend>
            <duration>0.04500</duration>
            <summary>Keynote- From Next Token Prediction to Reasoning and Beyond</summary>
            <description>Saturday at 13:40 in the Grand Hall!</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/DSAQW9/</url>
            <location>Grand Hall</location>
            
            <attendee>Jay Alammar</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>RMQUDE@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-RMQUDE</pentabarf:event-slug>
            <pentabarf:title>Conquering PDFs: document understanding beyond plain text</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T144500</dtstart>
            <dtend>20250607T153000</dtend>
            <duration>0.04500</duration>
            <summary>Conquering PDFs: document understanding beyond plain text</summary>
            <description>For the practical examples, I&#x27;ll be using spaCy, and the new Docling library and layout analysis models. I&#x27;ll also cover Optical Character Recognition (OCR) for image-based text, how to convert tabular data to pandas DataFrames, and strategies for creating training and evaluation data for information extraction tasks like text classification and entity recognition using PDFs and other documents as inputs.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/RMQUDE/</url>
            <location>Grand Hall</location>
            
            <attendee>Ines Montani</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>3QAKDE@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-3QAKDE</pentabarf:event-slug>
            <pentabarf:title>Not Another LLM Talk… Practical Lessons from Building a Real-World Adverse Media Pipeline</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T153000</dtstart>
            <dtend>20250607T161500</dtend>
            <duration>0.04500</duration>
            <summary>Not Another LLM Talk… Practical Lessons from Building a Real-World Adverse Media Pipeline</summary>
            <description>We’ve all seen the hype—LLMs are transforming workflows, revolutionising automation, and changing how we extract insights from text. But when it comes to real-world production systems, things get messy fast.

Our adverse media pipeline processes over 1 million news articles a day, scanning for reports of crimes linked to financial bad actors, money laundering, and other regulatory risks. With GenAI and LLMs, we have powerful new tools to automate entity extraction and risk detection. However, deploying these models at scale brings a whole new set of challenges:

🛠️ Breaking Down the Problem: Why structuring tasks into modular prompts and chaining responses is key to accuracy.
💰 Cost vs. Performance Trade-offs: How different prompting strategies and model choices (API-based vs. fine-tuned local models) impact cost and scalability.
🧐 Validation &amp; Governance: From handling hallucinations to dealing with sensitive data while staying within regulatory frameworks.
🧰 Open Source &amp; Practical Tooling: How to build reliable, cost-efficient LLM pipelines using tools in the Python ecosystem

To illustrate the real-world challenges of getting an LLM pipeline into production, we’ll introduce a cast of personas that will feel all too familiar:

- Cautious Claire – the compliance officer who doesn’t trust AI black boxes.
- Magic Mike – the sales lead who thinks LLMs can do anything.
- Just-Fine-Tune Jenny – the product manager convinced fine-tuning will fix everything.
- Reinventing Ryan – the engineer determined to build everything from scratch.
- Paranoid Pete – the security lead who fears LLMs will leak all the secrets.

Through their perspectives, we’ll explore the tensions, trade-offs, and hard-won lessons of taking an LLM-powered pipeline from a Jupyter notebook to a production-grade system. Expect practical insights through a real-world case study, and cautionary tales to help you navigate your own deployment challenges.

Who Should Attend?
This talk is for ML engineers, data scientists, software engineers, and product managers working with LLMs in production or planning to do so. Whether you’re evaluating architectures, struggling with cost control, or trying to balance compliance concerns, you’ll walk away with battle-tested strategies for building scalable, reliable, and regulation-friendly LLM pipelines.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/3QAKDE/</url>
            <location>Grand Hall</location>
            
            <attendee>Adam Hill</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>XDLFR3@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-XDLFR3</pentabarf:event-slug>
            <pentabarf:title>Successful Projects through a bit of Rebellion</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T161500</dtstart>
            <dtend>20250607T170000</dtend>
            <duration>0.04500</duration>
            <summary>Successful Projects through a bit of Rebellion</summary>
            <description>This talk is for leaders who want new techniques to improve their success rates. In the last 15 months I&#x27;ve built a private data science peer mentorship group where we discuss rebellious ideas that improve our ability to make meaningful change in organisations of all sizes.

As a leader you&#x27;ve no doubt had trouble defining new projects (perhaps you&#x27;ve been asked - &quot;add ChatGPT!&quot;), getting buy-in, building support, defining defensible metrics and milestones, hiring, developing your team, dealing with conflict, avoiding overload and ultimately delivering valuable projects that are adopted by the business. I&#x27;ll share advice across all of these areas based on 25 years of personal experience and the topics we&#x27;ve discussed in my leadership community.

You&#x27;ll walk away with new ideas, perspectives and references that ought to change how to work with your team and organisation.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/XDLFR3/</url>
            <location>Grand Hall</location>
            
            <attendee>Ian Ozsvald</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>SZ89HM@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-SZ89HM</pentabarf:event-slug>
            <pentabarf:title>PyData London 2025 Happy Hour</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T170000</dtstart>
            <dtend>20250607T180000</dtend>
            <duration>1.00000</duration>
            <summary>PyData London 2025 Happy Hour</summary>
            <description>Big thank you to our social sponsors NVIDIA and Anaconda!!</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/SZ89HM/</url>
            <location>Grand Hall</location>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>HAARJB@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-HAARJB</pentabarf:event-slug>
            <pentabarf:title>Why you should stop pretending your sparse data is dense</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T102000</dtstart>
            <dtend>20250607T110500</dtend>
            <duration>0.04500</duration>
            <summary>Why you should stop pretending your sparse data is dense</summary>
            <description>Data in the real world is complex, and one form that complexity often takes is missing values. In the Dataframe world, this can mean that your data is no longer representable as a nice rectangle of dense values. So what are the options?

Pandas has historically dominated the data science ecosystem, and offers a couple of alternatives. Certain datatypes, such as floats, timestamps, and strings, have a &quot;natural&quot; representation for missing values (NaN, NaT, and None respectively). Integer types present more of a challenge, as for a given bit-width, all binary values represent legitimate values. Pandas offers SparseArray with a user-defined fill-value. This is memory efficient, but it is still not possible to differentiate between a missing value, and a value that is present and equal to the fill value.

Arrow is the modern alternative in-memory Dataframe representation format, and it comes equipped with in-built handling for missing values that do not depend on the column type in any way. However, the Arrow sparse data representation has it&#x27;s own drawbacks in terms of both memory usage and processing speed.

This talk will compare and contrast, with examples, the above two approaches, along with the more sophisticated approach taken in ArcticDB. As a database, ArcticDB faces all of the same challenges as Pandas and Arrow for its in-memory processing, plus the extra consideration of efficiently serialising these data structures to disk.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/HAARJB/</url>
            <location>Doddington Forum</location>
            
            <attendee>Alex Owens</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>9YUDVW@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-9YUDVW</pentabarf:event-slug>
            <pentabarf:title>How we unified feature engineering across data and backend at Monzo</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T110500</dtstart>
            <dtend>20250607T115000</dtend>
            <duration>0.04500</duration>
            <summary>How we unified feature engineering across data and backend at Monzo</summary>
            <description>Join us for an in-depth exploration of Monzo&#x27; approach to feature engineering. This session will jump into the methodologies we use to streamline the creation of point-in-time correct features for model development. We will show how these features are transitioned into production environments, using real-time streaming powered by our event-driven architecture. Discover how we overcame challenges, reduced development time, and ensured data accuracy/consistency.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/9YUDVW/</url>
            <location>Doddington Forum</location>
            
            <attendee>Alex Jones</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>G9U7H8@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-G9U7H8</pentabarf:event-slug>
            <pentabarf:title>Enhancing Fraud Detection with LLM-Generated Profiles: From Analyst Efficiency to Model Performance</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T115000</dtstart>
            <dtend>20250607T123500</dtend>
            <duration>0.04500</duration>
            <summary>Enhancing Fraud Detection with LLM-Generated Profiles: From Analyst Efficiency to Model Performance</summary>
            <description>Objective:

Fraud detection systems often rely on manually crafted features or text embeddings of unstructured texts, which may miss nuanced patterns in unstructured data. This talk presents a case study where LLM-generated customer profiles—summarising transaction history, documents, interaction history and related profiles—were used to (1) accelerate compliance reviews and (2) extract embeddings that boosted fraud model performance and sped up its development.

Outline:
* 0-10 mins: Introduction to challenges in fraud detection: manual inefficiencies and limitations of traditional feature engineering.
* 10-20 mins: Methodology: Designing LLM-generated profiles to unify structured/unstructured data, and embedding extraction.
* 20-30 mins: Results: How embeddings of the LLM-generated summaries captured contextual relationships (e.g., subtle transaction-document inconsistencies) better than raw text embeddings or manual features, lessons learned, scalability considerations

Key Takeaways:
* LLMs can transform unstructured data into actionable insights for both human analysts and ML models.
Embeddings from LLM-generated summaries may outperform naive text embeddings by capturing synthesized context and reducing noise.
* Practical strategies to integrate LLMs into existing fraud detection pipelines without disrupting workflows.

Why It Matters:
This approach bridges the gap between unstructured data utilization and interpretable model improvements, offering a scalable approach for institutions implementing LLM-based solutions. 

Background Knowledge:
Basic understanding of NLP (e.g., embeddings) and supervised learning. No advanced LLM expertise is required.

Audience:
Data scientists, ML engineers, and fraud analysts familiar with basic NLP/ML concepts. Ideal for those exploring NLP applications in finance or seeking alternatives to manual feature engineering.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/G9U7H8/</url>
            <location>Doddington Forum</location>
            
            <attendee>Radion Bikmukhamedov</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>PWHCFA@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-PWHCFA</pentabarf:event-slug>
            <pentabarf:title>Tackling Data Challenges for Scaling Multi-Agent GenAI Apps with Python</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T144500</dtstart>
            <dtend>20250607T153000</dtend>
            <duration>0.04500</duration>
            <summary>Tackling Data Challenges for Scaling Multi-Agent GenAI Apps with Python</summary>
            <description>The use of multiple Large Language Models (LLMs) working together perform complex tasks, known as multi-agent systems, has gained significant traction. While orchestration frameworks like LangGraph and Semantic Kernel can streamline orchestration and coordination among agents, developing large-scale, production-grade systems can bring a host of data challenges. Issues such as supporting multi-tenancy, preserving transactional integrity and state, and managing reliable asynchronous function calls while scaling efficiently can be difficult to navigate.

Leveraging insights from practical experiences in the Azure Cosmos DB engineering team, this talk will guide you through key considerations and best practices for storing, managing, and leveraging data in multi-agent applications at any scale. You’ll learn how to understand core multi-agent concepts and architectures, manage statefulness and conversation histories, personalize agents through retrieval-augmented generation (RAG), and effectively integrate APIs and function calls.

Aimed at developers, architects, and data scientists at all skill levels, this session will show you how to take your multi-agent systems from the lab to full-scale production deployments, ready to solve real-world problems. We’ll also walk through code implementations that can be quickly and easily put into practice, all in Python.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/PWHCFA/</url>
            <location>Doddington Forum</location>
            
            <attendee>Theo van Kraay</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>Q3QERT@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-Q3QERT</pentabarf:event-slug>
            <pentabarf:title>Platforms for valuable AI Products: Iteration, iteration, iteration</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T153000</dtstart>
            <dtend>20250607T161500</dtend>
            <duration>0.04500</duration>
            <summary>Platforms for valuable AI Products: Iteration, iteration, iteration</summary>
            <description></description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/Q3QERT/</url>
            <location>Doddington Forum</location>
            
            <attendee>John Carney</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>XTU8RH@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-XTU8RH</pentabarf:event-slug>
            <pentabarf:title>NetworkX is Fast Now: Zero Code Change Acceleration</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T161500</dtstart>
            <dtend>20250607T170000</dtend>
            <duration>0.04500</duration>
            <summary>NetworkX is Fast Now: Zero Code Change Acceleration</summary>
            <description>### Part I
NetworkX is the most popular library in Python for graph theory and applied network science thanks to its extensive API and beginner-friendly documentation. NetworkX is used &quot;everywhere&quot;, because graphs are everywhere. Don&#x27;t believe me? We surveyed more than 300 Python packages to understand how they use NetworkX in domains ranging from geoscience, neuroscience, genomics, biology, chemistry, quantum computing, text and language, machine learning, causal inference, optimization, and more. We will summarize what we learned to help you apply graph analytics to your data.

Once you start using NetworkX you will soon realize that the pure-Python implementation starts becoming a roadblock to scalable graph analytics.

### Part II
What should you do when your graph data becomes too large or NetworkX becomes too slow? Simple: use an accelerated NetworkX backend!

NetworkX 3.0 added the ability to dispatch to other implementations. This means you can use other highly tuned libraries from NetworkX to achieve up to 100 to 10_000+ times speedup! As &quot;the API for graphs&quot;, NetworkX now makes it easy to accelerate your graph workflows on CPUs with [GraphBLAS](https://github.com/python-graphblas/graphblas-algorithms) and NVIDIA GPUs with nx-cugraph. Other backends are welcome, and we plan to support distributed graphs soon for extreme scalability 🚀🚀🚀

### Outline:

10 mins - Introduction to the world of network data, modeling with NetworkX, and needs of graph data in the world.

10 mins - How do backends work? Trade-offs of using backends

10 mins - Live demos</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/XTU8RH/</url>
            <location>Doddington Forum</location>
            
            <attendee>Mridul Seth</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>W8VCU7@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-W8VCU7</pentabarf:event-slug>
            <pentabarf:title>Parallel PyTorch Inference with Python Free-Threading</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T102000</dtstart>
            <dtend>20250607T110500</dtend>
            <duration>0.04500</duration>
            <summary>Parallel PyTorch Inference with Python Free-Threading</summary>
            <description>Python 3.13, released in October 2024, is the first version of Python to introduce support for a “no-GIL” free-threaded mode, per PEP-703 Making the Global Interpreter Lock Optional in CPython, unlocking the ability for multiple Python threads to run simultaneously.

This allows, for the first time since the language’s inception in December 1989, a single Python process to saturate all CPU cores in parallel with pure Python code (i.e. not farming out to extension modules written in C, C++, or, more recently, Rust).

This talk post explores what can be done with PyTorch now with the new free-threaded version of Python, specifically focusing on run-time inference on transformer-based generative models.

We will introduce a free-threaded implementation of an asyncio-based HTTP server that allows for parallel model inference of a GPT2 PyTorch model, scaling up to multiple GPUs with ease, all within a single Python process---this is novel, uncharted territory that is now unlocked thanks to free-threaded Python.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/W8VCU7/</url>
            <location>Hardwick Hub</location>
            
            <attendee>Michał Szołucha</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>HPXBZN@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-HPXBZN</pentabarf:event-slug>
            <pentabarf:title>Sovereign Data for AI with Python</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T110500</dtstart>
            <dtend>20250607T115000</dtend>
            <duration>0.04500</duration>
            <summary>Sovereign Data for AI with Python</summary>
            <description>The only certainty in life is that the pendulum will always swing. Recently, the pendulum has been swinging towards repatriation. However, the infrastructure needed to build and operate AI systems using Python in a sovereign (even air-gapped) environment has changed since the shift towards the cloud. This talk will introduce the infrastructure you need to build and deploy Python applications for AI - from data processing, to model training and LLM fine-tuning at scale to inference at scale.  We will focus on open-source infrastructure including:
a Python library server (Pypi, Conda, etc) and avoiding supply chain attacks
a container registry that works at scale
a S3 storage layer
a database server with a vector index</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/HPXBZN/</url>
            <location>Hardwick Hub</location>
            
            <attendee>Lex Avstreikh</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>RDVWPC@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-RDVWPC</pentabarf:event-slug>
            <pentabarf:title>Cutting Edge Football Analytics using Polars, Keras and Spektral</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T115000</dtstart>
            <dtend>20250607T123500</dtend>
            <duration>0.04500</duration>
            <summary>Cutting Edge Football Analytics using Polars, Keras and Spektral</summary>
            <description>Football analytics has become an essential part of the modern game, influencing everything from tactical decisions to player recruitment. However, much of the cutting-edge research remains locked behind club training grounds, making it difficult for those outside the professional sphere to explore advanced analytical techniques. Fortunately, open-source tools have lowered the barrier to entry, enabling analysts, researchers, and enthusiasts to develop sophisticated models using publicly available data.

This talk will provide a hands-on introduction to building football analytics models with Polars, Keras, and Spektral. We will start by exploring specific open-source football analytics Python libraries (kloppy and mplsoccer) followed by a brief introduction of basic Polars functionality, to efficiently process millions of player and ball coordinates from high-frequency positional tracking data. Next, we will introduce Keras and Spektral for Deep Learning and Graph Neural Networks (GNNs), demonstrating how these tools can be used to develop in-game prediction models and extract advanced football metrics.

Attendees will gain insights into how open-source machine learning techniques can be applied to football analytics, from raw data processing to model deployment. The session is suitable for those with a basic understanding of Python and machine learning concepts, but no prior experience with Polars or GNNs is required. Whether you&#x27;re a data scientist, football analyst, or simply curious about the intersection of AI and sports, this talk will provide an overview of some of the most prominent open-source resources for cutting-edge football research.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/RDVWPC/</url>
            <location>Hardwick Hub</location>
            
            <attendee>Joris Bekkers</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>9K8PHR@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-9K8PHR</pentabarf:event-slug>
            <pentabarf:title>PyScript - Python in the Browser</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T144500</dtstart>
            <dtend>20250607T153000</dtend>
            <duration>0.04500</duration>
            <summary>PyScript - Python in the Browser</summary>
            <description>[PyScript](https://pyscript.net/) is a fast-growing and vibrant open-source platform for Python in the browser. Thanks to PyScript, [CPython](https://python.org/) and [MicroPython](https://micropython.org/) run anywhere a browser runs, which is everywhere!

This talk, by a PyScript contributor, shows the initial steps needed to get PyScript working. It will describe various aspects of Python browser apps, including UI creation, event handling, CSS styling, and calls to an AI to create content. 

We assume you have basic Python skills but know little about Web technologies, such as JavaScript, CSS, or React. This talk will amaze you with how easy it is to write your own Python web app in the browser using PyScript.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/9K8PHR/</url>
            <location>Hardwick Hub</location>
            
            <attendee>Chris  Laffra</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>WPBA9U@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-WPBA9U</pentabarf:event-slug>
            <pentabarf:title>Media Mix Modelling - how we can save company budget?</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T153000</dtstart>
            <dtend>20250607T161500</dtend>
            <duration>0.04500</duration>
            <summary>Media Mix Modelling - how we can save company budget?</summary>
            <description>### Bayesian Media Mix Modeling: Empowering Engineers to Transform Marketing Analytics

The EU Cookie Law and similar regulations have reshaped the digital advertising landscape, creating challenges for marketing specialists accustomed to cookie-based tracking and last-click attribution. However, this challenge is also an opportunity for engineers and data scientists to step in and provide innovative solutions.

**Bayesian Media Mix Modeling (MMM)** offers a powerful way to analyze the effectiveness of marketing campaigns across channels like advertising platforms, social media, and video streaming services—without relying on personal user data. This talk is tailored for engineers, data scientists, and analysts who want to help their marketing colleagues navigate these uncertain waters by implementing MMM effectively.

You don’t need a marketing background for this session—just a solid grasp of classic data science principles and some experience in data engineering. We’ll cover the fundamentals of MMM, including:

#### Here’s what we’ll cover:

1. **What is MMM?**  
   A clear introduction to Media Mix Modeling, its purpose, and why it’s essential in the post-cookie era.

2. **Library Showdown: Which MMM Tools to Use**  
   A comparison of popular Python libraries for MMM, highlighting their strengths, weaknesses, and best use cases.

3. **From Inputs to Outputs: What You Need to Know**  
   We’ll discuss the required data inputs, expected outputs, and how to prepare for challenges when transitioning from theory to practice.

4. **The Real-World Data Problem**  
   Real-world data rarely resembles the clean examples you see in tutorials. Learn practical strategies to preprocess messy datasets and make your model work in realistic scenarios.

5. **Collaboration with Marketing Teams**  
   Discover why MMM is not a magic solution that replaces marketing professionals but rather a tool to enhance their decision-making. Learn how to foster effective collaboration between engineers and marketers.

6. **Evaluating and Using MMM Daily**  
   Practical advice on how to evaluate your MMM’s performance, integrate it into daily workflows, and ensure it delivers actionable insights.

By the end of this session, you’ll have the knowledge and inspiration to empower your organization with a cutting-edge marketing analytics solution—putting engineers at the heart of the decision-making process.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/WPBA9U/</url>
            <location>Hardwick Hub</location>
            
            <attendee>Natalia Ziemba‑Jankowska</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>DDJWLB@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-DDJWLB</pentabarf:event-slug>
            <pentabarf:title>LLM Inference Arithmetics: the Theory behind Model Serving</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T161500</dtstart>
            <dtend>20250607T170000</dtend>
            <duration>0.04500</duration>
            <summary>LLM Inference Arithmetics: the Theory behind Model Serving</summary>
            <description>The talk will cover the theory necessary to understand how to serve LLMs. The talk covers the math behind transformers inference in an accessible and light way. By the end of the talk, attendants will learn:

1. How to count the parameters in an LLM, especially the ones in the attention layers.
2. The difference between compute and memory in the context of LLM inference.
3. That LLM inference is made up of two parts: prefill and decoding.
4. What is an LLM server, and what features they implement to optimise GPU memory usage and reduce latency
4. How batching affects your inference metrics, like time-to-first-token.

The talk will cover:

**Did you pay attention?** (4 min). A short review of the attention mechanism and how to count parameters in a transformer-based model.

**Get to know your params** (8 min). The math-y section of the talk, explaining how to translate parameter counts into memory and compute requirements.

**Prefill and Decoding** (8 min) Explains that inference happens in two steps (prefill and decoding) and how KV-cache exploits this to make decoding faster. Common metrics to measure inference performance, like time-to-first-token and token-per-second.

**Context and batch size** (5 min) Adds to the picture the sequence length, as well as the number of requests to process in parallel. Explains how LLM servers, like vLLM, use techniques like Paged Attention to optimise GPU usage

**Conclusion** (5 min) Wrap up, Q&amp;A.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/DDJWLB/</url>
            <location>Hardwick Hub</location>
            
            <attendee>Luca Baggi</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>EGWMMC@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-EGWMMC</pentabarf:event-slug>
            <pentabarf:title>PyMC Code Sprint</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T102000</dtstart>
            <dtend>20250607T123500</dtend>
            <duration>2.01500</duration>
            <summary>PyMC Code Sprint</summary>
            <description>Whether you&#x27;re a seasoned Bayesian or completely new to probabilistic programming, this is your chance to contribute: write code, squash bugs, improve documentation, and develop practical examples. You&#x27;ll get hands-on guidance from PyMC core contributors while making real contributions to one of the leading Bayesian inference libraries in Python. No prior experience required—just bring your laptop and enthusiasm to learn and collaborate!</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://cfp.pydata.org/london2025/talk/EGWMMC/</url>
            <location>Library</location>
            
            <attendee>Chris Fonnesbeck</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>MRC78H@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-MRC78H</pentabarf:event-slug>
            <pentabarf:title>Python Engineering Excellence Birds of a Feather</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T153000</dtstart>
            <dtend>20250607T161500</dtend>
            <duration>0.04500</duration>
            <summary>Python Engineering Excellence Birds of a Feather</summary>
            <description>The session would consist of a short intro on what it means to achieve Python engineering excellence.  Followed by going round everyone at the session asking about where they feel they are in terms of their Python engineering skills, where they want to improve, and what kind of activities would best support that improvement</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/MRC78H/</url>
            <location>Library</location>
            
            <attendee>Sam Joseph</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>VKBKTY@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-VKBKTY</pentabarf:event-slug>
            <pentabarf:title>Feminist AI Lounge</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250607T153000</dtstart>
            <dtend>20250607T161500</dtend>
            <duration>0.04500</duration>
            <summary>Feminist AI Lounge</summary>
            <description></description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/VKBKTY/</url>
            <location>Elizabeth Board Room</location>
            
            <attendee>Ines Montani</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>ZDTG3L@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-ZDTG3L</pentabarf:event-slug>
            <pentabarf:title>AI for Everyone - Building Inclusive Machine Learning Models</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T101500</dtstart>
            <dtend>20250608T110000</dtend>
            <duration>0.04500</duration>
            <summary>AI for Everyone - Building Inclusive Machine Learning Models</summary>
            <description>Artificial Intelligence (AI) and Machine Learning (ML) have become central to decision-making processes across industries, from automating hiring decisions to medical diagnostics and financial services. While AI has the potential to drive efficiency and innovation, its benefits are not always equitably distributed. Biases embedded in training datasets, model design, and algorithmic decision-making can lead to discriminatory outcomes that disproportionately affect marginalized communities.

This talk, &quot;AI for Everyone: Building Inclusive Machine Learning Models,&quot; will explore the impact of AI bias and discuss strategies for creating more inclusive AI systems. We will analyze real-world examples where AI has failed underrepresented groups, from facial recognition technologies that misidentify people of color to automated systems that reinforce gender and socioeconomic disparities.

Key topics covered in this session include:

Bias in AI – Understanding how biases arise in datasets and machine learning models.
Dataset Design and Fair Representation – Best practices for creating diverse and representative training data.
Algorithmic Fairness – Techniques for detecting and mitigating bias in machine learning models.
Ethical AI Development – Principles and frameworks to ensure accountability, transparency, and inclusivity in AI.
The Societal Impact of Inclusive AI – How equitable AI can drive positive social change and empower underrepresented communities.
This session is designed for developers, data scientists, AI practitioners, and decision-makers who want to ensure fairness and inclusivity in their AI projects. Attendees will leave with a clear understanding of AI bias challenges and practical steps to design ethical, inclusive AI systems that benefit everyone.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/ZDTG3L/</url>
            <location>Grand Hall</location>
            
            <attendee>Elizabeth Osanyinro</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>WJXMZP@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-WJXMZP</pentabarf:event-slug>
            <pentabarf:title>Reproducibility in Embedding Benchmarks</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T110000</dtstart>
            <dtend>20250608T114500</dtend>
            <duration>0.04500</duration>
            <summary>Reproducibility in Embedding Benchmarks</summary>
            <description>Reproducibility in embedding benchmarks is no small feat. Prompt variability, growing computational demands, and evolving tasks make fair comparisons a challenge. The need for robust benchmarking has never been greater. 

The Massive Text Embedding Benchmark (MTEB) addresses these challenges with a standardized, open-source framework for evaluating text embedding models. Covering diverse tasks like clustering, retrieval, and classification, MTEB ensures consistent and reproducible results. Extensions like MMTEB (multilingual) and MIEB (image) further expand its capabilities.

In this talk, we’ll explore the quirks and complexities of benchmarking embedding models, such as prompt sensitivity, scaling issues, and emergent behaviors. We’ll show how MTEB simplifies reproducibility, making it easier for researchers and industry practitioners to measure progress, choose the right models, and push the boundaries of embedding performance.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/WJXMZP/</url>
            <location>Grand Hall</location>
            
            <attendee>Isaac Chung</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>9GTM3Q@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-9GTM3Q</pentabarf:event-slug>
            <pentabarf:title>CUDA in Python: A New Era for GPU Acceleration</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T114500</dtstart>
            <dtend>20250608T123000</dtend>
            <duration>0.04500</duration>
            <summary>CUDA in Python: A New Era for GPU Acceleration</summary>
            <description>CUDA has been accessible to Python developers for over a decade, but often through third-party abstractions that lag behind the latest CUDA releases. However, that’s changing—over the next year, NVIDIA is making Python a first-class CUDA language.

In this talk, we’ll explore how Python programmers can leverage the CUDA platform today and how native Python support is evolving across the entire CUDA stack.

We begin with an overview of the CUDA programming model and how to manage accelerator devices as a core part of a Python application. Then, we dive into three practical examples:

Image Processing for Machine Learning Pipelines – Launching, executing, and streaming transformations directly from Python.
Neural Network Primitives – Implementing operations like softmax with blockwise parallelism.
High-Performance Deep Learning – Integrating with optimized libraries that leverage low-level, highly tuned CUDA kernels.
To showcase the power of these Python interfaces, we conclude with a hands-on demonstration: implementing GPT-2 (inspired by llm.c) entirely in Python—achieving performance nearly identical to its C counterpart.

Join us to discover the joy of CUDA from Python, and unlock new possibilities in GPU acceleration with a familiar, high-level language!</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/9GTM3Q/</url>
            <location>Grand Hall</location>
            
            <attendee>Andy Terrel</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>8NMPDW@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-8NMPDW</pentabarf:event-slug>
            <pentabarf:title>Keynote- Innovation is Dead</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T133000</dtstart>
            <dtend>20250608T141500</dtend>
            <duration>0.04500</duration>
            <summary>Keynote- Innovation is Dead</summary>
            <description>Sunday 13:30 in the Grand Hall.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/8NMPDW/</url>
            <location>Grand Hall</location>
            
            <attendee>Tony Mears</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>CPNZ9G@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-CPNZ9G</pentabarf:event-slug>
            <pentabarf:title>Diving into Transformer Model Internals</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T144500</dtstart>
            <dtend>20250608T153000</dtend>
            <duration>0.04500</duration>
            <summary>Diving into Transformer Model Internals</summary>
            <description>The inner workings of transformers is a huge topic, and one that constantly evolves, so it&#x27;s impossible to cover absolutely everything in 30 minutes. I&#x27;d like the audience to take away from this talk the &quot;minimal viable knowledge&quot; that helps them to understand the most salient details, and to build an intuition around what goes on under the hood.

We&#x27;ll cover:

1. An overview of how transformers process text using an example
2. Transformers as a concept vs specific implementations, particularly HuggingFace&#x27;s transformers library
3. A code tour of the HuggingFace transformers library

This talk is primarily aimed at programmers and software engineers, who want to build a coder&#x27;s intuition for how this stuff really works, as well as data scientists who want to better understand how transformers are implemented internally.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/CPNZ9G/</url>
            <location>Grand Hall</location>
            
            <attendee>Matt Squire</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>HN7ZRP@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-HN7ZRP</pentabarf:event-slug>
            <pentabarf:title>You Came to a Python Conference. Now, Go Do a PR Review!</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T153000</dtstart>
            <dtend>20250608T161500</dtend>
            <duration>0.04500</duration>
            <summary>You Came to a Python Conference. Now, Go Do a PR Review!</summary>
            <description># 1. Introduction (5 minutes)
- a. How pull request reviews are a great way to use your Python skills to make an impact  
- b. Overview of what makes a good PR review: technical Python knowledge and clear, helpful communication  

# 2. Archetypes of Bad Reviewers (5 minutes)
- a. The “Looks Good to Me” reviewer – No meaningful feedback  
- b. The “Technical Nitpicker” – Overly technical but unconstructive communication  
- c. The “Nit” Commenter – Poor communication despite valid points  

# 3. Technical Python Knowledge for PR Reviews (20 minutes)
- a. Pass by reference vs. pass by value  
- b. Immutable vs. mutable types  
- c. Common Python-specific pitfalls  
  - i. Ex: Avoiding default mutable arguments  
- d. Identifying inefficiencies  
  - i. Loops vs. list comprehensions  
  - ii. When to use generators  
- e. Using underutilized tools  
  - i. `pathlib`  
  - ii. `defaultdict`  

# 4. Communication Related to PR Reviews (7 minutes)
- a. Principles of constructive feedback  
  - i. Clarity  
  - ii. Respect  
  - iii. Specificity  
  - iv. Why  
- b. Techniques for making technical feedback actionable  
- c. Encouraging dialogue in PRs  

# 5. Conclusion (3 minutes)
- a. Recap key takeaways  
- b. Balance technical rigor with clear, helpful communication</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/HN7ZRP/</url>
            <location>Grand Hall</location>
            
            <attendee>Samiul Huque</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>LSXNTQ@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-LSXNTQ</pentabarf:event-slug>
            <pentabarf:title>Scaling AI workloads with Ray &amp; Airflow</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T161500</dtstart>
            <dtend>20250608T170000</dtend>
            <duration>0.04500</duration>
            <summary>Scaling AI workloads with Ray &amp; Airflow</summary>
            <description>This talk will discuss the benefits of using the [Airflow Ray provider package](https://github.com/astronomer/astro-provider-ray)  to orchestrate Ray pipelines using Apache Airflow. They include:
- Integration: Incorporate Ray jobs into Airflow DAGs for unified workflow management.
- Distributed computing: Use Ray&#x27;s distributed capabilities within Airflow pipelines for scalable ETL and LLM  fine-tuning.
- Monitoring: Track Ray job progress through Airflow&#x27;s user interface.
- Dependency management: Define and manage dependencies between Ray jobs and other tasks in DAGs.
- Resource allocation: Run Ray jobs alongside other task types within a single pipeline.-</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/LSXNTQ/</url>
            <location>Grand Hall</location>
            
            <attendee>Tatiana Al-Chueyr</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>A87LEE@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-A87LEE</pentabarf:event-slug>
            <pentabarf:title>From Trees to Transformers: Our Journey Towards Deep Learning for Ranking</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T101500</dtstart>
            <dtend>20250608T110000</dtend>
            <duration>0.04500</duration>
            <summary>From Trees to Transformers: Our Journey Towards Deep Learning for Ranking</summary>
            <description>GetYourGuide is a global online marketplace that helps travelers discover and book the best experiences. One of our core challenges is ensuring users always see the most relevant activities first—a task historically powered by an XGBoost-based ranking system. However, as we continued refining our tree-based models, returns on incremental improvements began to plateau. To spark our next step change in performance, we decided to adopt Deep Learning.

In this talk, we will share how, in just nine months, we migrated our ranking pipeline to a Deep Learning architecture while maintaining tight latency and high-throughput requirements. We will walk through our phased approach, starting with a minimal viable model to confirm our production setup and gradually increasing its complexity. Along the way, we tested over 50 iterations offline and ran more than 10 live A/B tests to validate the impact on our customers. Ultimately, we rolled out a PyTorch transformer-based model with significant business impact. We will also discuss the main challenges we faced on the operational and modeling sides, how we overcame them, and the lessons we learned.

You will leave with practical strategies for transitioning from traditional tree-based models to neural networks in production. Join us to learn how to advance your machine-learning capabilities and unlock new dimensions of relevance and personalization for real-time ranking.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/A87LEE/</url>
            <location>Doddington Forum</location>
            
            <attendee>Theodore Meynard</attendee>
            
            <attendee>Mihail Douhaniaris</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>A8PQEU@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-A8PQEU</pentabarf:event-slug>
            <pentabarf:title>Making LLMs reliable: A practical framework for production</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T110000</dtstart>
            <dtend>20250608T114500</dtend>
            <duration>0.04500</duration>
            <summary>Making LLMs reliable: A practical framework for production</summary>
            <description>LLMs are transforming how we build applications, but their non-deterministic outputs and potential for hallucination create barriers for adoption in high-risk industries. In this talk will discuss a systematic approach to LLM application development that covers pre-production and experimentation phase, real-time guardrails for output validation and post analysis for identifying areas for improvement.

We’ll talk about:
- Creating comprehensive test sets with edge case coverage
- Unit tests for LLMs and establishing baseline metrics for reliability assessment
- Structured experimentation approaches for prompt optimization
- Real-time guardrails for output validation
- Live monitoring and alert systems
- Log analysis for pattern identification

We&#x27;ll demonstrate practical implementations using Python libraries and monitoring tools, with real-world examples from production systems. The session will provide actionable insights for software developers, AI engineers and product managers looking to deploy LLM applications responsibly and gain stakeholder trust.

Attendees will leave with:
- A structured framework for LLM application development
- Practical code examples for implementing guardrails
- Strategies for continuous monitoring and improvement

This talk is suitable for intermediate practitioners who work with LLMs and need to ensure their reliable deployment in production environments.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/A8PQEU/</url>
            <location>Doddington Forum</location>
            
            <attendee>Lena Shakurova</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>C7KGVS@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-C7KGVS</pentabarf:event-slug>
            <pentabarf:title>Analysing smart meter data to uncover energy consumption patterns</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T114500</dtstart>
            <dtend>20250608T123000</dtend>
            <duration>0.04500</duration>
            <summary>Analysing smart meter data to uncover energy consumption patterns</summary>
            <description>This talk is for those interested in learning about:
- applied data science in a non-profit organisation &amp; in the field of sustainability/ home decarbonisation;
- the data science techniques we used to uncover patterns of energy usage, such as clustering;
- conducting data science work in a secure lab environment/ how to analyse sensitive and confidential data;
- translating insights to a non-data science audience;
- working with multidisciplinary teams, including designers and domain experts.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/C7KGVS/</url>
            <location>Doddington Forum</location>
            
            <attendee>Sofia Pinto</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>H3H3BL@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-H3H3BL</pentabarf:event-slug>
            <pentabarf:title>Agentic Cyber Defense with External Threat Intelligence</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T144500</dtstart>
            <dtend>20250608T153000</dtend>
            <duration>0.04500</duration>
            <summary>Agentic Cyber Defense with External Threat Intelligence</summary>
            <description>In an era where cyber threats are growing both in complexity and frequency, harnessing external threat intelligence can provide a decisive edge in cybersecurity. This session offers a deep dive into developing autonomous agentic AI systems that leverage publicly available threat data to drive proactive defense mechanisms.

Key Focus Areas:

Integrating External Data: Learn strategies to ingest, clean, and harmonize diverse external datasets—such as open-source threat feeds, OSINT, and incident logs—with your internal security data, creating a comprehensive situational awareness.

Agentic AI in Cyber Defense:
Understand the core principles behind agentic AI and its application in autonomous cybersecurity systems. Discover how AI agents can continuously monitor network behavior, learn from evolving threats, and execute proactive countermeasures.

Addressing Security Challenges:
Delve into the challenges of deploying autonomous systems in adversarial environments. The talk will cover best practices for mitigating vulnerabilities, including strategies to combat adversarial attacks and data poisoning.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/H3H3BL/</url>
            <location>Doddington Forum</location>
            
            <attendee>Jyoti Yadav</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>FRRUL8@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-FRRUL8</pentabarf:event-slug>
            <pentabarf:title>Is coding assistant as good as we thought in coding?</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T153000</dtstart>
            <dtend>20250608T161500</dtend>
            <duration>0.04500</duration>
            <summary>Is coding assistant as good as we thought in coding?</summary>
            <description>In this talk, the speaker will explain the current state of AI coding assistants, what is in the market, and what they promise. The speaker will also, with some real experience from developers who have used coding assistants, explore the potential and limitations of the assistants. From there, we will also look into the future, predicting the landscape of the software engineering industry and as a developer how we can take advantage of the coding assistants instead of getting our jobs taken by them.

## Topics covered

- Introduction to various coding assistants
- The pros and cons of using coding assistants
- How will coding assistant affect the industry
- As developers, who shall we position ourselves in the AI landscape
- Summary and take aways

## Goal

To explain, in an as objective way as possible, the effect of AI coding assistants in a developer&#x27;s career and to be proactive in preparing what&#x27;s to come.

## Target Audience

Everyone who codes for a living or anyone who is enthusiastic about coding. The speaker expects all levels of familiarity with AI coding assistants.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/FRRUL8/</url>
            <location>Doddington Forum</location>
            
            <attendee>Cheuk Ting Ho</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>AYL3PL@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-AYL3PL</pentabarf:event-slug>
            <pentabarf:title>Polars, DuckDB, PySpark, PyArrow, pandas, cuDF: how Narwhals has brought them all together!</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T161500</dtstart>
            <dtend>20250608T170000</dtend>
            <duration>0.04500</duration>
            <summary>Polars, DuckDB, PySpark, PyArrow, pandas, cuDF: how Narwhals has brought them all together!</summary>
            <description>Narwhals is a lightweight and extensible compatibility layer between dataframe libraries. It is already used by several open source libraries including Altair, Marimo, Plotly, Scikit-lego, Vegafusion, and more. You will learn how to use Narwhals to build dataframe-agnostic tools.

This is a technical talk aimed at tool-builders. You&#x27;ll be expected to be familiar with Python and dataframes. We will cover:

- 2-3 minutes: Motivation. Why are there so many dataframe libraries?
- 2-3: minutes: Life before vs after Narwhals - real-world examples of how the data landscape is changing
- 7-8 minutes: Basics of Narwhals, wrapping native objects, expressions vs Series, lazy vs eager
- 7-8 minutes: Advanced Narwhals concepts: row order, non-elementary group-by aggregations, multi-indices, null values, backwards-compatibility promises
- 10 minutes: What is the Narwhals community like, how can you contribute and get involved, what comes next?
- 5-10 minutes: Engaging Q&amp;A / awkward silence

Tool builders will benefit from the talk by learning how to build tools for modern dataframe libraries without sacrificing support for foundational classic libraries such as pandas.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/AYL3PL/</url>
            <location>Doddington Forum</location>
            
            <attendee>Marco Gorelli</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>TLFMW3@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-TLFMW3</pentabarf:event-slug>
            <pentabarf:title>Automating Porosity Detection in Additive Manufacturing with Deep Learning</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T101500</dtstart>
            <dtend>20250608T110000</dtend>
            <duration>0.04500</duration>
            <summary>Automating Porosity Detection in Additive Manufacturing with Deep Learning</summary>
            <description>This talk delves into the application of deep learning to automate porosity detection in additive manufacturing (AM) components. Using convolutional neural networks (CNNs) and advanced image segmentation models, the session walks through the entire pipeline, from pre-processing 3D CT scan data to training and evaluating AI models, while addressing practical challenges like imbalanced datasets and computational costs.

As an informative and technical session, this talk demonstrates how AI can significantly enhance defect analysis, making quality control in AM faster, more accurate, and scalable. Attendees will leave with a clear understanding of the technical process, real-world applications, and the potential for AI to transform AM quality assurance.

Time Outline:
1.	Introduction (0-5 min) – AM overview, porosity challenges, limitations of manual analysis.
2.	Deep Learning for Porosity Detection (5-20 min) – CNNs, segmentation models, pre-processing.
3.	Case Study (20-25 min) – Real-world application, performance metrics, challenges.
4.	Future Directions (25-30 min) – AI-driven quality control.

This talk is ideal for AI practitioners, engineers, and researchers, bridging deep learning with industrial defect detection. While no hands-on activities are included, references to open-source tools and datasets will be provided for interested attendees that want to explore.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/TLFMW3/</url>
            <location>Hardwick Hub</location>
            
            <attendee>Onyekachukwu Ojumah</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>PDXDNQ@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-PDXDNQ</pentabarf:event-slug>
            <pentabarf:title>One repo to rule them all, one repo to bind them...Control all of your projects with copier!</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T110000</dtstart>
            <dtend>20250608T114500</dtend>
            <duration>0.04500</duration>
            <summary>One repo to rule them all, one repo to bind them...Control all of your projects with copier!</summary>
            <description>Developers love to work on new projects. Researchers love to experiment on new ideas. If you&#x27;re anything like me, you have lots of little libraries for every new problem or idea that comes your way. And if you&#x27;re like me, you also love keeping abreast of the latest-and-greatest tooling in the ever-changing Python ecosystem.

My approach for a new project has always been to copy/paste/find/replace my most recently used project as a template. This lead to a predictable problem - every project evolved in small ways from the one before it. Travis became GitHub actions, flake8/black became ruff, my setup.pys were replaced by pyproject tomls...I created an unmaintainable mess of almost-immediately deprecated patterns! I tried to leverage cookiecutter and template repos with mixed success.

Instead of making progress on any project, I was constantly bogged down amidst a perpetually updating ecosystem. 

In this talk, I&#x27;ll discuss the solution: copier. Copier lets you render projects from templates...and keep them in sync with upstream changes. With a few tweaks and a helpful GitHub action, you can control all of your projects from one central location.

Add a rust extension? No problem. New linter flags? Trivial! Accidentally mispelled your own name in 50+ public projects? In this talk, I&#x27;ll show you how to pretend it never happened!</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/PDXDNQ/</url>
            <location>Hardwick Hub</location>
            
            <attendee>Tim Paine</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>DP77TK@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-DP77TK</pentabarf:event-slug>
            <pentabarf:title>Git Commit, MedTech Transformed: Python’s Medical Robotics Breakthrough</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T114500</dtstart>
            <dtend>20250608T123000</dtend>
            <duration>0.04500</duration>
            <summary>Git Commit, MedTech Transformed: Python’s Medical Robotics Breakthrough</summary>
            <description>This talk will provide a demo of a deep learning model developed using Python for lung nodule detection and classification in medical images. The model, built with 3D Convolutional Neural Networks (CNNs), is trained on public datasets (TCIA, LUNA16) and will be evaluated using metrics such as accuracy, sensitivity, specificity, and AUC-ROC. The talk will include:
* Preprocessing and augmentation techniques used to handle medical image data.
* An overview of the 3D CNN architecture and training process.
* Visualizations of the model&#x27;s output, showing detected and classified lung nodules.
* A discussion of how this model could be integrated into a robotic-assisted bronchoscopy system, potentially using ROS, to guide instrument placement during biopsies.

The session will highlight the practical application of Python&#x27;s libraries (TensorFlow/PyTorch, OpenCV, Scikit-learn) in medical image analysis and demonstrate how these techniques can contribute to advancements in lung cancer diagnosis and treatment within the MedTech industry.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/DP77TK/</url>
            <location>Hardwick Hub</location>
            
            <attendee>Lilinoe Harbottle</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>QUNRWL@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-QUNRWL</pentabarf:event-slug>
            <pentabarf:title>Debugging Leadership: Six Errors when Moving From Code to Management</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T144500</dtstart>
            <dtend>20250608T153000</dtend>
            <duration>0.04500</duration>
            <summary>Debugging Leadership: Six Errors when Moving From Code to Management</summary>
            <description>Transitioning from a technical role to leadership is a unique challenge; it’s no longer about writing clean code or optimizing performance but about empowering teams, making decisions, and balancing competing priorities. In this talk, I’ll share eight key lessons from my own experience. I frame these lessons as Python errors that technical professionals will find relatable and actionable.

Key Lessons:

ValueError: self-worth not defined: Imposter syndrome can hit hard when stepping into leadership. I’ll share how I redefined my sense of value and impact beyond just writing code.

DeadlockError: unable to release control: Delegation doesn’t come naturally to many of us. I’ll discuss how learning to let go and trust your team is critical for scaling your impact.

KeyError: culture not found: Leadership isn’t just about building great products; it’s about building great teams. We’ll explore how to create a culture where people can thrive.

AttributeError: clear_message not found: Communication is the cornerstone of effective leadership. I’ll share how I developed this skill to articulate vision, handle negotiations, and navigate tough conversations.

TypeError: instant_gratification is not callable: Unlike coding, leadership rarely provides quick wins. I’ll explain how to find satisfaction in long-term progress and team success.

UnhandledImpactError: cascading effects detected: How you show up as a leader has a ripple effect on your team and clients. I’ll discuss how to be intentional about your presence and its impact.

DependencyError: support module not imported: You can’t do it alone. I’ll share the value of building a support network of mentors, peers, and advisors.

RuntimeError: system overload: Burnout is real, and leadership can amplify it if you’re not careful. We’ll explore strategies to prioritize your own well-being as a leader.

What You’ll Gain:

This session is designed to help technical professionals better understand the realities of leadership transitions. By framing common challenges as Python errors, it provides a relatable and engaging way to explore the pitfalls and opportunities of stepping into leadership. Attendees will leave with:
Knowledge of what they’re getting themselves into and the challenges they might face.
Lessons learned from my personal experiences of transitioning to leadership.
Tools and advice to approach leadership transitions with practical strategies and a grounded perspective.

Whether you’re contemplating a leadership role or already on the journey, this talk will provide valuable lessons to help you lead with purpose and avoid common mistakes along the way.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/QUNRWL/</url>
            <location>Hardwick Hub</location>
            
            <attendee>Matt Upson</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>FM3UCY@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-FM3UCY</pentabarf:event-slug>
            <pentabarf:title>Building a knowledge graph for climate policy</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T153000</dtstart>
            <dtend>20250608T161500</dtend>
            <duration>0.04500</duration>
            <summary>Building a knowledge graph for climate policy</summary>
            <description>We&#x27;ll take you on a technical deep-dive into how we&#x27;ve built and scaled a knowledge graph which maps the relationships between thousands of climate policy concepts, and identifies where those concepts appear in our corpus of climate policy and other climate-relevant documents.

We&#x27;ll share the high-level methodology, infrastructure decisions, and evaluation framework which have allowed our small team to process millions of passages of text while maintaining high standards for fairness and accuracy.

After covering the basics of what a knowledge graph is, and why you might want to build one, we&#x27;ll cover:

1. **Knowledge Graph Architecture &amp; Methodology**
   - An ontology which can handle the complexity of the climate policy domain
   - Interoperability considerations with existing sub-domain taxonomies
   - Why we&#x27;re building in the open with Wikibase
   - The value of real human expertise

2. **Classifier Development &amp; Evaluation**
   - A common model for classifiers, which can encompass a range of architectures from straightforward regexes, to fine-tuned BERT-based models, to optimised calls to third-party LLMs
   - Sampling strategies for building representative evaluation datasets
   - Quantitative metrics vs qualitative vibe-checks for classifier selection

3. **Production Infrastructure &amp; Scaling**
   - A modular pipeline design separating model management, inference, and indexing
   - Prefect-based orchestration for distributed inference
   - Infrastructure as code with Pulumi
   - Planned integration with our existing search and RAG systems

The audience should leave the talk with a clear understanding of:

- Practical considerations when building domain-specific, high-impact knowledge graphs
- Methods for evaluating NLP classifier performance in technical domains
- Approaches to scaling inference pipelines, from local experimentation to routine cloud-based deployments
- How we plan to use our knowledge graph to power a climate policy research platform, including integrations with RAG and other LLM-driven systems

This talk should be particularly stimulating for data scientists and engineers working on information retrieval systems, knowledge graphs, or other high-impact natural language processing systems.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/FM3UCY/</url>
            <location>Hardwick Hub</location>
            
            <attendee>Harrison Pim</attendee>
            
            <attendee>Fred O&#x27;Loughlin</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>PGTEWH@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-PGTEWH</pentabarf:event-slug>
            <pentabarf:title>Transfer Learning: Leveraging Pretrained Models with Limited Data</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T161500</dtstart>
            <dtend>20250608T170000</dtend>
            <duration>0.04500</duration>
            <summary>Transfer Learning: Leveraging Pretrained Models with Limited Data</summary>
            <description>This talk provides a comprehensive exploration of transfer learning, focusing on how pretrained models can be leveraged for tasks with limited labelled data. It begins with an introduction to the core principles of transfer learning, covering different strategies such as feature extraction, fine-tuning, and domain adaptation. The session then delves into the benefits and challenges of using pretrained models, helping attendees understand when and how to apply these techniques effectively.
We will discuss how to choose and adapt pretrained models, with a specific focus on YAMNet, Whisper, and wav2vec2 for audio processing. The talk will cover strategies for handling limited data and severe class imbalance, including data augmentation, synthetic data generation, and advanced loss functions. Attendees will gain insights into fine-tuning techniques, such as layer-wise training and regularisation, to optimise model performance while preventing overfitting. A case study on laughter detection will illustrate these concepts in practice, demonstrating how multiple models can be combined for improved accuracy. Finally, we will explore applications beyond audio, including transfer learning in NLP and computer vision, highlighting cross-domain adaptation techniques and emerging trends in multimodal AI.</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/PGTEWH/</url>
            <location>Hardwick Hub</location>
            
            <attendee>Salman Khan</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>TCFWVY@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-TCFWVY</pentabarf:event-slug>
            <pentabarf:title>Leaders at PyData</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T114500</dtstart>
            <dtend>20250608T123000</dtend>
            <duration>0.04500</duration>
            <summary>Leaders at PyData</summary>
            <description>A self-organised workshop for data leaders to discuss the opportunity and challenges they face with their peers. This is the 9th iteration at a PyData conference. Questions are raised and answered by attendees, it is facilitated by Ian Ozsvald (PyDataLondon co-founder). You are encouraged to carry on talking to fellow leaders after this session, Ian will give out badges to help with this.

The format is based on the Breakout discussions that Ian uses in his private RebelAI leadership group, you&#x27;re welcome and encouraged to copy and use it in your own organisations. Typical attendance is 60+ leaders.

The 2022 session using a different format (&quot;Executives at PyData&quot; as it was known) was written up, you can see it here: https://numfocus.medium.com/executives-at-pydata-global-2022-193cbc2d3f3b</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Talk</category>
            <url>https://cfp.pydata.org/london2025/talk/TCFWVY/</url>
            <location>Library</location>
            
            <attendee>Ian Ozsvald</attendee>
            
        </vevent>
        
        <vevent>
            <method>PUBLISH</method>
            <uid>HQH7DY@@cfp.pydata.org</uid>
            <pentabarf:event-id></pentabarf:event-id>
            <pentabarf:event-slug>-HQH7DY</pentabarf:event-slug>
            <pentabarf:title>Humble Data Workshop</pentabarf:title>
            <pentabarf:subtitle></pentabarf:subtitle>
            <pentabarf:language>en</pentabarf:language>
            <pentabarf:language-code>en</pentabarf:language-code>
            <dtstart>20250608T144500</dtstart>
            <dtend>20250608T161500</dtend>
            <duration>1.03000</duration>
            <summary>Humble Data Workshop</summary>
            <description>We invite those from under-represented groups in data to apply to join us at PyData London (8th June 2024) Humble Data Workshop. In this workshop, you will learn the basics of programming in Python, as well as how to use tools such as Jupyter Notebook to analyse data.

We wish to be able to run a workshop with as many participants as we can accommodate online, however, we also need a lot of mentors to help out. Being a mentor not only help you to familiarise yourself with your knowledge in data science, it also gives you a good vibe afterwards. If you have the skill to share, we are happy to welcome you to our Humble Data family. 

Apply to be a mentor https://forms.gle/2cvNyRK8c8pNnpnz5</description>
            <class>PUBLIC</class>
            <status>CONFIRMED</status>
            <category>Tutorial</category>
            <url>https://cfp.pydata.org/london2025/talk/HQH7DY/</url>
            <location>Library</location>
            
            <attendee>Hugh Evans</attendee>
            
        </vevent>
        
    </vcalendar>
</iCalendar>
