PyData Global 2025

Aarti Jha is a Principal Data Scientist at Red Hat, where she develops AI-driven solutions to streamline internal processes and reduce operational costs. She brings over 6.5 years of experience in building and deploying data science and machine learning solutions across industry domains. She is an active public speaker and frequently presents at developer and data-science conferences, focusing on practical approaches to applied AI and LLMs.

When AI Makes Things Up: Understanding and Tackling Hallucinations

Adam Cowley

Adam Cowley is Manager of Developer Education at Neo4j. He leads the team behind GraphAcademy, Neo4j’s developer learning platform. His 20+ years of experience spans software engineering, data analysis, and product ownership. He is currently focused on applying Generative AI to create more personalised developer education.

Using MCP to turn Claude into a Football Opposition Analyst

Agustin Figueroa Nazar

Agus is a Senior Data Analyst at GetYourGuide, where he specializes in using data to identify customer and marketplace needs that could be solved at scale with data products. His work encompasses identifying customer problems, designing experimentation frameworks to measure progress, developing analytical solutions, and translating business requirements into data science projects. Beyond his core responsibilities, Agus is passionate about storytelling, teaching, singing, and almost anything on stage.

Enhancing Marketplace Competitiveness: A Bayesian Approach to modelling the cold start problem

Allen Downey

The SAT math gap: gender difference or selection bias?

Allen Downey

Allen Downey is a principal data scientist at PyMC Labs and professor emeritus at Olin College. He is the author of several books including Think Python, Think Bayes, and Probably Overthinking It -- and a blog about programming and data science. He received a Ph.D. in computer science from the University of California, Berkeley, and Bachelor's and Masters degrees from MIT.

Bayesian Decision Analysis with PyMC: Beyond A/B Testing

Allison Ding

Allison Ding is a developer advocate for GPU-accelerated AI APIs, libraries, and tools at NVIDIA, with a specialization in large language models (LLMs) and advanced data science techniques. She brings over eight years of hands-on experience as a data scientist, focusing on managing and delivering end-to-end data science solutions. Her academic background includes a strong emphasis on natural language processing (NLP) and generative AI. Allison holds a master’s degree in Applied Statistics from Cornell University and a master’s degree in Computer Science from San Francisco Bay University.

Scaling Data Processing for LLMs with NeMo Curator

Aman Bhandari

Using Traditional AI and LLMs to Automate Complex and Critical Documents in Healthcare

Andrew Yule

Andrew Yule is a co-founder and managing partner of Pontem Analytics, a global consulting company in the energy industry specializing in combining domain expertise with data-driven solutions. Andrew has 14 years of experience in the energy industry, where he has contributed to a diverse range of projects spanning both offshore and onshore. He has been a member of SPE since he began his career in 2011 and is currently a contributor for SPE’s The Way Ahead magazine as well as a chairman on the Fort Worth SPE board. He is also a member of the Young Entrepreneurial Council. His technical background includes a bachelor’s degree in Chemical Engineering from the Colorado School of Mines and a master’s degree in data science from Southern Methodist University.

Revolutionizing Safety Log Analysis in Oil and Gas: A Multi-Stage LLM Approach for Enhanced Hazard Identification

Aniket Abhay Kulkarni

Aniket is an engineer at heart. He has founded Curlscape, where he helps businesses bring practical AI applications to life fast. He has led the design and deployment of large-scale systems across industries, from finance and healthcare to education and logistics. His work spans LLM-based information extraction, agentic workflows, voice assistants, and continuous evaluation frameworks.

Scaling Fuzzy Product Matching with BM25: A Comparative Study of Python and Database Solutions

Anita Ihuman

How Do We Create Access for Those Who Don’t Show Up in Our Spaces?

Avik Basu

Beyond Just Prediction: Causal Thinking in Machine Learning

Boris Lau

Boris Lau currently serves as a Staff Software Engineer specializing in MLOps and Site Reliability Engineering (SRE) at Khan Academy. His expertise in machine learning infrastructure and observability is critical for ensuring the performance and reliability of AI-driven products, such as Khanmigo.

He lives in Vancouver, Canada, and serves as an organizer for the local Vancouver PyData chapter.

Let Me Structure Freely? How to Improve LLM Structured Output Quality

Breno Brito

ML engineer, Data Scientist and author with over a decade in total experience, specially in Finance and Bitcoin industries. I translated several books from English to Portuguese, won prizes in several hackathons with LLM solutions and have been interviewed in dozens of podcasts and newspapers.

Future proof your AI product

Busirah Olaitan Hammed

Busirah Hammed is a data engineer at YellowCard financial with over 6 years experience building data solutions. She's a data enthusiast whose experience spans across data science and engineering.

Fast, Cost-Efficient Analytics on Blockchain data using DuckDB - Solana as a case study

CLINTON OYOGO DAVID

Clinton Oyogo David is a Data Scientist at Oxford Policy Management, specializing in geospatial analytics, data engineering, dashboard development, and automation. He has led data-intensive projects across Africa and Asia, developing data pipelines, dashboards, and data analysis for various organisations. Clinton combines a background in statistics with a deep interest in scalable data solutions that inform policy and drive impact. His recent work focuses on harmonizing large raster datasets using tools like xarray and Dask to support small area estimation of poverty and sustainable development research.

Engineering Large-scale geospatial raster processing with xarray and dask

Cainã Max Couto da Silva

I’m a data scientist and AI engineer with 10+ years of experience across academic research and industry, building GenAI and machine learning solutions for research labs, startups, and Fortune 500 companies. I’m also a passionate educator, contributing to data training programs as a professor and consultant, and an active open-source contributor and speaker at conferences like SciPy and PyData.

Building Production-Ready Research AI Assistants with One-Command Setup

Caitlin Lewis

fastplotlib: driving scientific discovery through data visualization

Charaf ZGUIOUAR

Quantitative Finance and Econometrics Gradutate from Sorbonne's University. Currently working as Data Scientist at BNP Paribas & as lecturer at Sorbonne's University.

Optimal Variable Binning in Logistic Regression

Chris Rackauckas

Dr. Chris Rackauckas is the VP of Modeling and Simulation at JuliaHub, the Director of Scientific Research at Pumas-AI, Co-PI of the Julia Lab at MIT, and the lead developer of the SciML Open Source Software Organization. For his work in mechanistic machine learning, his work is credited for the 15,000x acceleration of NASA Launch Services simulations and recently demonstrated a 60x-570x acceleration over Modelica tools in HVAC simulation, earning Chris the US Air Force Artificial Intelligence Accelerator Scientific Excellence Award. See more at https://chrisrackauckas.com/. He is the lead developer of the Pumas project and received a top presentation award at every ACoP from 2019-2021 for improving methods for uncertainty quantification, automated GPU acceleration of nonlinear mixed effects modeling (NLME), and machine learning assisted construction of NLME models with DeepNLME. For these achievements, Chris received the Emerging Scientist award from ISoP.

Why Julia's GPU-Accelerated ODE Solvers are 20x-100x Faster than Jax and PyTorch

Claudio Giorgio Giancaterino

Statistics & Actuarial background
Actuary during the day
Data Scientist in the free time

Harnessing Generative Models for Synthetic Non-Life Insurance Data

Claudio Salvatore Arcidiacono

I am Claudio, a Senior Data Scientist at Mollie. I have been working in the fintech sector over the past 7 years, I have lots of experience in classical machine learning problems, mainly in binary classification problems. I love to contribute to data science open source packages like feature engine, scikit-learn and narwhals. I maintain a couple of packages myself (felimination and sklearo). In my free time I am a coffee scientist, I use a data driven approach to dial in the perfect cup of espresso.

How to Effectively use text embeddings in tree based models

Cédric Couralet

Cédric Couralet, Data Scientist at Insee, is an open-source enthusiast, with expertise in software architecture and secure system design.

torchTextClassifiers : Modernizing Text classification for French National Statistics

César Soto Valero

César is currently a Data Scientist at SEB Group, where he develops AI models to enhance the security of financial transactions on a global scale. He completed an M.Sc. in Machine Learning and moved to Sweden in 2018 to pursue a Ph.D. in Computer Science at KTH Royal Institute of Technology. During his five years at KTH, he pioneered open-source tools and techniques to mitigate software bloat, contributing to the efficiency and security of modern software systems. César is deeply passionate about AI, science, and technology, with a strong focus on bridging cutting-edge research with real-world applications. He is dedicated to advancing AI’s role in building smarter, more resilient systems that drive innovation.

From Ideas to APIs: Delivering Fast with Modern Python
Realtime Financial Fraud Detection with Modern Python

DR NISHA ARORA

Dr. Nisha Arora is a data professional with experience across analytics, data science, reporting automation, storytelling, and applied statistical methods using Python, R, and Excel.
With a background spanning technical writing, reviewing, and corporate trainings, she focuses on making advanced tools accessible to analysts and non-technical users.
Her work bridges business-facing tools like Excel with scalable, reproducible workflows in Python. She creates accessible, practical learning content and actively contributes to the data community through her trainings, talks, and YouTube channel.
She is currently working on a book project aimed at helping professionals modernize spreadsheet-based processes through Python.

Python Meets Excel: Smarter Workflows for Analysts and Data Teams

Danica Fine

Danica began her career as a software engineer in financial services and pivoted to developer relations, where she focussed primarily on open source technologies under the Apache Software Foundation umbrella such as Apache Kafka and Apache Flink. She now leads the open source advocacy efforts at Snowflake, supporting Apache Iceberg and Apache Polaris (incubating). She can be found on X (Bluesky and Mastodon), talking about tech, plants, and baking @TheDanicaFine.

Quiet on Set: Building an On-Air Sign with Open Source Technologies

Daniel Chen

I am a lecturer at The University of British Columbia and data science educator at Posit, PBC. I love teaching tools to empower data scientists.

LLMs, Chatbots, and Dashboards: Visualize Your Data with Natural Language

David Aronchick

David Aronchick is CEO of Expanso (expanso.io), the global, intelligent pipeline company.

Previously, he led Compute over Data at Protocol Labs, Open Source Machine Learning Strategy at Azure, was a product management for Kubernetes on behalf of Google, launched Google Kubernetes Engine, and co-founded the Kubeflow project and the SAME project. He has also worked at Amazon, Chef and co-founded three startups.

When not spending too much time in service of electrons, he can be found on a mountain (on skis). traveling the world (via restaurants) or participating in kid activities, of which there are a lot more than he remembers than when he was that age.

Keynote: David Aronchick- From Pandas to Policy-as-Code: The Future of ML Data Engineering

Dawn Wages

The Lifecycle of a Jupyter Environment: From Exploration to Production-Grade Pipelines

Dr. Rebecca Bilbro

Where Have All the Metrics Gone?

Dylan Bouchard

Dylan Bouchard is a Principal Applied Scientist focusing on AI Research & Open Source at CVS Health. He leads the company's Responsible AI Research program, where he developed two impactful open source libraries: UQLM, a toolkit for detecting hallucinations in large language models, and LangFair, a framework for evaluating bias and fairness in LLMs. His work bridges academic research with practical tools that help make AI systems more reliable and equitable.

UQLM: Detecting LLM Hallucinations with Uncertainty Quantification in Python

Evan Wimpey

Evan Wimpey is an analytics professional turned stand-up comedian, delivering smart, custom comedy. Whether you're hosting a tech offsite, academic event, or a product team that just needs a laugh, Evan tailors content that resonates with your audience.

Python Worst Practices: Learn from the Expert

Eyal Kazin

I'm an Ex-cosmologist turned data scientist with 20 years experience in solving challenging problems. I am motivated by intellectual challenges, highly detail oriented and love visualising data results to communicate insights for better decisions within organisations.

My main drive is applying scientific approaches that result in practical and clear solutions. To accomplish these, I use whatever works, be it statistical/causal inference, machine/deep learning or optimisation algorithms. Being result driven I have a passion for facilitating stakeholders to make data driven decisions by quantifying and communicating the impact of interventions to non-specialist audiences in an accessible manner.

In my free time I craft engaging articles on applied stats in data science and machine learning: https://medium.com/@eyal-kazin

My claim for fame is that between 2004-2014 I lived in four different continents within a span of a decade, including three tennis Grand Slam cities (NYC, Melbourne, London).

🚪🚪🐐 Lessons in Decision Making from the Monty Hall Problem

Francesc Alted

I am a curious person who studied Physics (BSc, MSc) and Applied Maths (MSc). I spent over a year at CERN for my MSc in High Energy Physics. However, I found maths and computer sciences equally fascinating, so I left academia to pursue these fields. Over the years, I developed a passion for handling large datasets and using compression to enable their analysis on commodity hardware accessible to everyone.

I am the CEO of ironArray SLU and also leading the Blosc Development Team, and currently interested in determining, ahead of time, which combinations of codecs and filters can provide a personalized compression experience. I am also very excited in providing a way for sharing Blosc2 datasets in the network in an easy and effective way via Caterva2, and Cat2Cloud, a software as a service for handling and computing with datasets directly in the cloud.

As an Open Source believer, I started the PyTables project more than 20 years ago. After 25 years in this business, I started several other useful open source projects like Blosc2, Caterva2 and Btune; those efforts won me two prizes that mean a lot to me:

2023: NumFOCUS Project Sustainability Award
2017: Google’s Open Source Peer Bonus

You can know more on what I am working on by reading my latest blogs.

Hands-on with Blosc2: Accelerating Your Python Data Workflows

Hajime Takeda

Hajime is a data professional with 8+ years of expertise in marketing, retail, and eCommerce, working in New York.

TinyTroupe: Enhancing Marketing Insights through LLM-Powered Multiagent Persona Simulation

Iain Docherty

Iain Docherty is a Chemical Engineer with over 10 years of experience across nuclear, energy, mining, and renewables sectors. He is currently a Lead Engineer at Pontem Analytics, specializing in combining first-principles modelling with data-driven approaches to optimise processes. Proven experience in developing and deploying control and optimization solutions leveraging deep reinforcement learning and machine learning techniques.

Revolutionizing Safety Log Analysis in Oil and Gas: A Multi-Stage LLM Approach for Enhanced Hazard Identification

Idan Richman Goshen

Idan Richman Goshen is a data-driven technologist with an M.A. in Economics and more than a decade of experience turning raw data into business impact. Before leading the Data Science team at Lusha, he built production-grade machine-learning systems at Localize and Dell.

When the Meter Maxes Out: Chernobyl Disaster Lessons for ML Systems in Production

Indranil Ghosh

Indra is a postdoctoral fellow in applied mathematics at Massey University, New Zealand, working on all things "dynamical systems". He takes a computational approach to tackle complex problems, and his current research is focused on understanding collective behaviour exhibited by coupled neurons. He is an avid Python user and has been a speaker at multiple Python-related conferences before. More information can be found in his website: https://indrag49.github.io/.

Time series analysis for coupled neurons.

Inessa Pawson

[BoF] From Data to Decisions: Leveraging Generative AI Across the Data Science Workflow

Irina Loghin

Irina Loghin is a Technical Curriculum Developer at Neo4j Identity and Access Management (IAM) expert. With a background in security architecture and developer education, she specializes in making complex IAM concepts accessible through graph-based thinking and practical solutions.

At Neo4j, Irina designs technical learning programs that help engineers and architects rethink identity through connected data models.

She is passionate about building clear mental models for modern IAM, and advocates for approaches that prioritize portability, visibility, and developer autonomy.

Connected Identities: Rethinking Identity and Access Management with Neo4j and Python

Itai Gilo

Itai is a seasoned software engineer, passionate about clean code and design, and about simplifying what is complex. Doing what’s needed, whether it’s backend, full-stack, or mobile development, and enjoys creating well-crafted products.

Garbage In, Lawsuit Out: Building Compliant and Reproducible ML Pipelines

Iyanu Falaye

Iyanu Falaye is a software engineer and product strategist with a passion for open-source communities and developer enablement. With experience spanning engineering operations, product development, and cross-functional collaboration, he actively supports inclusive contribution models beyond just code. Iyanu has spoken at community tech events and facilitated team knowledge-sharing sessions, always focused on helping others grow their impact in the tech ecosystem.

Python Beyond the Code: Unlocking Hidden Contributions in Open Source

Jacob Quinn

A core contributor to both the Julia language and package ecosystem for over a decade.

Modernizing JSON for Julia

Jacob Tomlinson

Jacob Tomlinson is a senior software engineer at NVIDIA. His work involves maintaining open source projects including RAPIDS and Dask. He also tinkers with kr8s in his spare time. He lives in Exeter, UK.

GPU Python for the Real World: Practical Steps to GPU-Accelerated Python with RAPIDS
EffVer: Versioning code by the effort required to upgrade

Jayita Bhattacharyya

AI ML Nerd with a blend of technical speaking & hackathon wizardry! Applying tech to solve real-world problems. The work focus these days is on generative AI. Helping software teams incorporate AI into transforming software engineering.

How Big are SLMs

Jen Wei

Jen Wei is an independent AI research engineer with a PhD in applied mathematics and a love for building things from scratch — especially when she probably shouldn’t. She’s reverse-engineered transformer architectures, implemented modern techniques like mixture-of-experts and Multi-head latent attention, and still enjoys writing clean PyTorch code at 2am for fun (and maybe for revenge). Jen currently works in the GenAI space and shares her work openly on Hugging Face. Her favorite research topics include efficient LLM architecture, post-training techniques, and the existential crises of overparameterized models.

🌸 Personal Website
🤗 repo
𝐗
Medium
💼 LinkedIn profile

I Built a Transformer from Scratch So You Don’t Have To

Jeroen Janssens

Jeroen Janssens, PhD, is Head of Developer Relations at Posit, PBC. His expertise lies in visualizing data, implementing machine learning models, and building solutions using Python, R, JavaScript, and Bash. He’s passionate about open source and sharing knowledge. He’s the author of Python Polars: The Definitive Guide (O’Reilly, 2025) and Data Science at the Command Line (O’Reilly, 2021). Jeroen holds a PhD in machine learning from Tilburg University and an MSc in artificial intelligence from Maastricht University. He lives with his wife and two kids in Rotterdam, the Netherlands.

Python Polars: The Definitive Crash Course

Jim Dowling

Jim Dowling is CEO of Hopsworks and a former Associate Professor at KTH Royal Institute of Technology. He is the organizer of the annual feature store summit and co-organizer of PyData Stockholm. He is the author of an O'Reilly book on building ML systems: batch, real-time and LLMs.

From Feature Engineering to Context Engineering for Agents

Joe Pringle

Joe Pringle is VP of Customer Success at lakeFS supporting open source data version control and infrastructure, by providing expertise on data strategy, data science, AI and machine learning. He helps accelerate innovation, and plan and execute data science and machine learning initiatives. He has 20+ years experience helping large enterprises use data to increase impact on important public policy issues including education, health, the environment, and economic development. He also has a passion for focusing technology initiatives on people - and working backwards from understanding end users to identify opportunities to help busy people work faster, smarter, and better.

Computer Vision Data Version Control and Reproducibility at Scale

Jonas Van Malder

RDepot - 100% open source enterprise management of Python and R repositories

Jonathan Shi

Jonathan is a software engineer on Snowflake's Snowpark Python team, and is a maintainer of the Modin project. He enjoys building systems that are usable, maintainable, and performant.

Bridging Interactive Data Science and Big Data with Hybrid Execution

José Quenum

José Quenum is a Researcher at the Namibia University of Science and Technology (NUST). His interests include Distributed Systems, Artificial Intelligence and Big Data.

HPC Implementation of a Hybrid Recommender System in Julia

Kamil Raczycki

Geospatial Data Scientist with a drive to contribute to the open-source space. Co-developer of SRAI library and maintainer of QuackOSM and OvertureMaestro libraries.
Interested in exploring how machine learning models with geospatial data can improve our lives.

Getting big OpenStreetMap data with QuackOSM

Kushal Kolar

PhD Candidate at NYU. 10+ years of experience using Python for data analysis and machine learning with neuroscience datasets. Core developer of fastplotlib and maintainer of several Python libraries in neuroscience with significant user bases, and a contributor to other libraries such as tslearn and CaImAn.

fastplotlib: driving scientific discovery through data visualization

Lily Xu

Using Traditional AI and LLMs to Automate Complex and Critical Documents in Healthcare

Luke Shaw

Degree in Physics, Princeton University, 2019
Masters in Applied Mathematics, University of Edinburgh, 2020
PhD in Applied Mathematics, Universitat Jaume I 2024
Working at ironArray as engineer and product owner since 2025.

Hands-on with Blosc2: Accelerating Your Python Data Workflows

Malte Tichy

Malte Tichy has a research background in theoretical quantum physics, with a PhD from
the University of Freiburg. He learned the nuts and bolts of applied data science and forecasting within various hands-on and leadership roles at the supply chain software company Blue Yonder. As a Discipline Expert in Data Analytics & AI, he works on forecasts for wind-turbine component reliability and maintenance expenditures at Siemens Gamesa Renewable Energy.

Reviving Survival Analysis: Timeless, Yet Overlooked?

Manjunath Janardhan

I am a Principal AI Engineer with over two decades of experience transforming complex business challenges through innovative AI solutions. My career is defined by delivering measurable impact, including a patented Intelligent Service Platform that achieved an 80% reduction in operational costs.
Currently at MSG Global Solutions, I lead AI development initiatives for SAP Enterprise applications, with a primary focus on SAP Profitability and Performance Management (PaPM). My work involves architecting and implementing enterprise-scale Generative AI solutions for the PaPM Universal Model, where I integrate vector databases with SAP HANA to significantly enhance information retrieval capabilities.

My previous role at GE Healthcare demonstrated my ability to scale AI solutions globally, where I built on-premises Generative AI systems that boosted developer productivity by 40% across international teams. I specialize in combining open-source Large Language Models with Hybrid-RAG and Agentic techniques, leveraging cloud-native architectures across AWS, Azure, and GCP platforms. My portfolio includes high-impact tools such as MICT GPT, CODE GPT, and Service GPT, with Aspire CODE GPT notably reducing development time for the Aspire CT Product by 30%.

My technical foundation encompasses the complete software development lifecycle, from modernizing monolithic systems to microservices using Java and C++, to containerizing applications with Docker and Kubernetes. I maintain active contributions to open-source NLP projects, reflecting my commitment to advancing the broader AI community.

Professional development remains central to my practice. I regularly engage with the AI community through conferences, workshops, webinars, and hackathons, recently developing a working prototype for a Socratic DSA Tutor. As an industry speaker, Medium blogger, and content creator, I share practical insights on AI implementation strategies and emerging technologies, focusing on mentoring the next generation of AI engineers while driving innovation in enterprise AI applications.

Automating ML with PyCaret: Train & Compare Multiple Models to Find the Best Performer

Mark Kittisopikul, Ph.D.

I am a Software Engineer III at the Janelia Research Campus of the Howard Hughes Medical Institute. I specialize in working with data from light microscopy drawing upon my experience as a postdoctoral cell biologist.

Combining Zarr, HDF5, and TIFF into a single data format

Martin Durant

projspec: what's this project anyway?

Mateusz Sokół

I'm a Software Engineer at Quansight, working on multitude of open source projects in the Scientific Python Ecosystem. You can find my GitHub profile here: https://github.com/mtsokol

PyData/Sparse & Finch: extending sparse computing in the Python ecosystem

Matthew Cox

I'm a hobbyist Python user and data analyst, with a passion for making meaningful visualizations that illustrate the story behind the data. I've been coding to solve my own problems and curiousities for almost five years now, and this is my first application to present a project at a conference.

Animating Equity: Python Dashboards for Small-Town Housing and Displacement Risk

Matthias Boeck

Dr. Matthias Böck holds a doctorate in bioinformatics and machine learning and has been working as a data scientist in the Data Product department at the Munich-based consultancy FELD M since 2013. He is the technical manager for projects in the fields of machine learning and data strategy. He is the author of specialist books on AI, holds design thinking workshops and works with universities on research projects. In addition to these fields, he is also involved in the topic of data for good and its use in practice.

Bundestag Chat: Discovering Political Landscape with RAG Systems

Meilame Tayebjee

As a Data Scientist at the Innovation Lab of the French National Institute of Statistics and Economic Studies (Insee), I focus on the deployment of machine learning models, the enhancement of MLOps best practices, and the development of torchTextClassifiers, a PyTorch package designed to streamline the training of deep learning models for text classification.

I am also pursuing a PhD in Computer Science jointly at CREST and Inria, where my research centers on foundational Transformer-based models for the analysis of healthcare pathways.

torchTextClassifiers : Modernizing Text classification for French National Statistics

Michael Alan Washington

Michael Washington is a Microsoft MVP and an ASP.NET C# Microsoft Blazor programmer. He has extensive knowledge in artificial intelligence, and student information systems. He is the founder of BlazorData.net.

Build your own Personal Data Warehouse

Mohit Singh Chauhan

I am Senior Data Scientist at CVS Health and works in the Responsible AI and LLM/Agentic systems. My expertise lies in the technical aspects of ethical AI, with a particular focus on bias and fairness testing. I am dedicated to identifying and mitigating biases in AI systems to ensure they are fair and equitable for all users. Additionally, I specialize in hallucination detection and mitigation for large language models (LLMs), multi-modal models, and AI agents, striving to enhance the reliability and trustworthiness of these advanced technologies. The recent cutting-edge tools includes open-source libraries like LangFair and UQLM.

UQLM: Detecting LLM Hallucinations with Uncertainty Quantification in Python

Natan Katz

Natan Katz is the co-founder of LuminAI, a startup pioneering statistical red teaming — a method for testing and securing white-box AI models through statistical and geometric analysis of model activations. At LuminAI, he develops techniques to detect and defend against optimization-based adversarial attacks such as PGD, DeepFool, and Carlini–Wagner, helping organizations build safer and more trustworthy AI systems.

Before founding LuminAI, Natan worked across diverse applied domains — from quantitative modeling and speech analysis to customer journey optimization and biometrics — bridging theory and practice across industries. He has also published work on AI for Ethereum ecosystems and AI ethics. Natan holds an M.Sc. in Nonlinear Dynamics from the Weizmann Institute of Science, where he studied dynamic models for malignant tissues.

Open Source Models' Security- Adversarial attacks, Poisoning & Sponge

Naty Clementi

Naty Clementi is a senior software engineer at NVIDIA. She is a former academic with a Masters in Physics and PhD in Mechanical and Aerospace Engineering to her name. Her work involves contributing to RAPIDS, and in the past she has also contributed and maintained other open source projects such as Ibis and Dask. She is an active member of PyLadies and an active volunteer and organizer of Women and Gender Expansive Coders DC meetups.

GPU Python for the Real World: Practical Steps to GPU-Accelerated Python with RAPIDS

Noor Aftab

Noor Aftab is the Global Program Lead at Amazon Web Services (AWS), where she drives strategic programs for Amazon S3, supporting some of the world’s most complex data, AI, and analytics workloads. With a foundation in software engineering and data science, she brings over a decade of experience building and scaling cloud-native solutions, AI/ML systems, and developer-focused programs.

She serves as Vice President of the Society of Women Engineers (SWE) Pacific Northwest section, championing technical leadership and mentoring initiatives across engineering communities. Noor is also Chair of the NumFOCUS Code of Conduct Working Group and User Group Leader for IBM Women in AI, where she fosters inclusive, resilient communities across 300+ open-source projects.

A frequent keynote speaker, Noor has presented at PyData Global, SciPy, ODSC, TEDx, IEEE, and 13+ global venues, delivering talks that connect technical depth with real-world adoption of AI and cloud. She has authored and led initiatives such as the IEEE Hour of Power AI training program, empowering engineers and professionals with practical AI skills.

Her contributions to technology and leadership have been recognized with awards, including the Australia Alumni Excellence Award and Asia Pacific HRM Congress Award, with media features in the BBC, Martha’s Vineyard Times, and Hindustan Times.

GitHub: aftabn81
| Website: www.nooraftab.com

Keynote- Noor Aftab- The Next Commit: Building Inclusive, Data-Driven Ecosystems for Responsible AI

Piotr Kalota

Piotr Kalota is a Machine Learning Engineer at FELD M with a Master’s in Human-Centered AI from DTU. Specializing in NLP and accessible tech, he develops retrieval-augmented generation (RAG) systems and other LLM-driven solutions. With four years of experience in software engineering and machine learning, he combines human-centered design and innovation to create accessible AI solutions.

Bundestag Chat: Discovering Political Landscape with RAG Systems

Quan Nguyen

Decisions Under Uncertainty: A Hands‑On Guide to Bayesian Decision Theory

Richard Iannone

Rich is a software engineer that enjoys working with Python. He likes to create Python libraries that help people to accomplish things. While Rich very clearly digs programming, he enjoys other things as well! Examples include: playing and listening to music, reading books, watching films, meeting up with friends, and wandering through the many valleys and ravines of the Greater Toronto Area.

Communicating Data Quality: Making the Invisible Visible (and Fun!) with Pointblank

Robin Troesch

Data Engineer trying to reduce the impact of computing on the climate and helping the energy transition.
Working at Electricity Maps in Copenhagen (DK) since 2022 first in the data platform team responsible for acquiring grid data. Joined the grid forecast team in 2023.
Currently working on electricity grid forecasts, enabling people to consume electricity when it's the cleanest and predicting load peaks. Especially interested on how to run large scale infrastructure with a minimal footprint.

Building a Lightweight Feature Store for Electricity Grid Forecasts with Polars

Rodrigo Silva Ferreira

Rodrigo Silva Ferreira is a QA Engineer at Posit, where he contributes to the quality and usability of open-source tools that empower data scientists working in R and Python. He focuses on both manual and automated testing strategies to ensure reliability, performance, and an excellent user experience.

Rodrigo holds a BSc. in Chemistry with minors in Applied Math and Arabic from NYU Abu Dhabi and a MSc. in Analytical Chemistry from the University of Pittsburgh. Multilingual and globally minded, he enjoys working at the intersection of data, science, and technology — especially when it means building tools that help people better understand and navigate the world through its increasingly complex data.

Text Mining Orkut’s Community Data with Python: Cultural Memory, Platform Neglect, and Digital Amnesia

Saurabh Garg

I'm currently focused on building a frictionless Machine Learning Platform at Outerbounds, where our mission is to let data scientists and ML engineers stay focused on AI/ML development—while we manage the infrastructure that powers it.

My background is in large-scale distributed systems, with experience spanning cloud infrastructure and identity/authorization systems. I've worked on infrastructure teams at Oracle Cloud and Outerbounds, and on IAM/authorization platforms at Atlassian and Databricks.

At Atlassian, I was part of the team that built a CQRS-based permissions system deployed across six AWS regions, handling 100K+ read requests with sub-3ms P99 latencies.

At Databricks, I founded and led a 6-engineer team focused on authorization. We transitioned the platform from a monolithic client-based model to a service-oriented architecture, integrating with ~35 internal services and achieving P99 latencies under 1 second for over 10K requests per second.

Outside of engineering, I enjoy spending time with my daughter, and I'm always up for a game of cricket or table tennis.

Optimizing AI/ML Workloads: Resource Management and Cost Attribution

Scott Routledge

Scott is a Software Engineer at Bodo.ai, where he has worked on the performance and reliability of the BodoSQL engine, contributed to the Bodo Just-In-Time Python Compiler, and is currently working on Bodo DataFrames. He earned his undergraduated in computer science from Carnegie Mellon University.

Bodo DataFrames: a fast and scalable HPC-based drop-in replacement for Pandas

Sebastian Wallkötter

The Boringly Simple Loop Powering GenAI Apps

Sergei Nasibian

Sergei Nasibian is a Quantitative Strategist at Rothesay in London, where he designs and implements systematic trading and risk management models. He previously worked as a Data Scientist at McKinsey & Company and as a Senior Analyst at Yandex Eats, developing data-driven strategies across diverse domains. Sergei holds a degree with honors in Mathematics from Lomonosov Moscow State University, specializing in probability theory and stochastic processes. His research experience includes entropy-based change-point detection methods developed during a collaboration with Ulm University. Sergei is passionate about translating advanced mathematical concepts into practical, production-ready tools using open-source Python libraries, and he enjoys exploring intersections between machine learning, statistical modeling, and financial markets.

Detecting Regime Shifts in Time Series with Python: Entropy-Based Change-Point Detection

Shekhar Prasad Rajak

Passionate Open Source Advocate and Software Engineer at Apple.
Shekhar is a seasoned open-source developer and advocate, with contributions to SymPy, NumPy, SciPy, Bundler, and as the author of daru and daru-view in the SciRuby ecosystem. A two-time GSoC alumnus (2016,17) and former SciRuby org admin, he has mentored across multiple open-source communities. He has spoken at leading conferences, including RubyConf, PyCon, ApacheCon, and Community Over Code. Currently, he is a Software Development Engineer at Apple, driving innovation in software engineering.

Streaming AI Workflows in Python: Kafka Queues and Flink-Powered LLM Inference

Sooraj Sivadasan

Designing a Fast, Offline-Capable Reverse Geocoder in Python: An Open Source Alternative to Big Geo APIs

Sourav Saha

Sourav has 12+ years of professional experience at NEC Corporation in the diverse fields of High-Performance Computing, Distributed Programming, Compiler Design, and Data Science. Currently, his team at NEC R&D Lab, Japan, is researching various data processing-related algorithms. Blending the mixture of different niche technologies related to compiler framework, high-performance computing, and multi-threaded programming, they have developed a Python library named FireDucks with highly compatible pandas APIs for DataFrame-related operations. In his previous engagements, he has worked in research and development of performance-critical AI and Big Data solutions, optimization of several legacy applications related to weather prediction, earth-quake simulation, etc., written in C++ and Fortran. He has been speaking at several meetups and technical conferences related to HPC and Data Science.

Lessons learnt in optimizing a large-scale pandas application using Polars, FireDucks and cuDF: Go Smart and Save More!

Timothy Spann

https://github.com/tspannhw/SpeakerProfile

Tim Spann is a Senior Solutions Engineer @ Snowflake. He works with Generative AI, LLM, Snowflake, SQL, HuggingFace, Python, Java, Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Spark, Big Data, IoT, Cloud, AI/DL, Machine Learning, and Deep Learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal Developer Advocate at Zilliz, Principal Developer Advocate at Cloudera, Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Senior Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in Computer Science.

Enhancing Apache NiFi 2.x with Python Processors

Tobia De Koninck

I work as a Software and Infrastructure engineer on open-source tools for data science.

Accelerate deployment of your Python data science apps using ShinyProxy

Tom Augspurger

I'm a software engineer at NVIDIA working on GPU-accelerated ETL tools as part of the RAPIDS team. I've helped maintain several libraries in the scientific python and geospatial stacks.

GPU Accelerated Zarr

Willow Marie Ahrens

Willow Ahrens
Willow Ahrens

Assistant Professor in the School of Computer Science at Georgia Tech.

Office 3144, Klaus Advanced
Computing Building, Georgia Tech
Email
GitHub
ORCID
Google Scholar

About

I am an assistant professor in the School of Computer Science at Georgia Tech. I am inspired to make programming high-performance computers more productive, efficient, and accessible. My research focuses on using compilers to accelerate productive programming languages with state-of-the-art datastructures, algorithms, and architectures, bridging the gap between program flexibility and performance. I’m the author of the Finch sparse tensor programming language. Finch supports general programs on general tensor formats, such as sparse, run-length-encoded, banded, or otherwise structured tensors. Please reach out if you are interested in doing research at Georgia Tech!

PyData/Sparse & Finch: extending sparse computing in the Python ecosystem

amar naik

With over two decades of experience in the IT industry, I am a Strategic Engineering Leader with a deep focus on digital transformation, AI integration, and data-driven product innovation. My expertise lies in architecting intelligent systems that combine different tools/agents/systems, automation, and analytics to solve complex business challenges across sectors such as fintech, healthcare, and public services.

Over the past few years, I’ve led the development and deployment of multiple solutions that automate knowledge retrieval, orchestrate multi-step business workflows, and enhance human decision-making. I have used AI frameworks like LangChain, CrewAI, and ReAct to design scalable multi-agent systems that balance autonomy with control. These implementations have significantly improved operational efficiency, user experience, and stakeholder engagement.

A strong advocate for practical, ethical, and secure AI adoption, I help organizations bridge the gap between emerging AI capabilities and enterprise readiness. I’ve mentored global engineering teams and consultants in building AI-driven platforms, fostering a culture of innovation, experimentation, and continuous learning.

My passion lies in enabling businesses to move beyond automation and into intelligent collaboration — where human and agent teams co-create value. I am excited to contribute to the evolving AI ecosystem and share insights that empower teams to build resilient, human-aligned AI solutions

The Human Side: Leading and Mentoring Global Data Teams in the Age of AI

bhrathjatoth

Senior AI Engineer with over eight years of experience architecting scalable machine learning, generative AI, and LLM solutions. Holding a B.Tech from IIT Guwahati, he specializes in RAG, LangChain, PyTorch, and AWS, delivering innovations like a fact-checking system for Cyara, I led risk quantification and hallmarking software projects, boosting exports by 8–9% CAGR. Recognized at CGI’s Global Meet 2018, Bharath drives transformative AI solutions with Docker, Kubernetes, and cloud pipelines, blending technical expertise with impactful leadership.

Streaming AI Workflows in Python: Kafka Queues and Flink-Powered LLM Inference

hugo bowne-anderson

Hugo Bowne-Anderson is an independent data and AI consultant with extensive experience in the tech industry. He is the host of the industry Vanishing Gradients, where he explores cutting-edge developments in data science and artificial intelligence.
As a data scientist, educator, evangelist, content marketer, and strategist, Hugo has worked with leading companies in the field. His past roles include Head of Developer Relations at Outerbounds, a company committed to building infrastructure for machine learning applications, and positions at Coiled and DataCamp, where he focused on scaling data science and online education respectively.
Hugo's teaching experience spans from institutions like Yale University and Cold Spring Harbor Laboratory to conferences such as SciPy, PyCon, and ODSC. He has also worked with organizations like Data Carpentry to promote data literacy.
His impact on data science education is significant, having developed over 30 courses on the DataCamp platform that have reached more than 3 million learners worldwide. Hugo also created and hosted the popular weekly data industry podcast DataFramed for two years.
Committed to democratizing data skills and access to data science tools, Hugo advocates for open source software both for individuals and enterprises.

Building LLM-Powered Applications for Data Scientists and Software Engineers

marthin thomas

HPC Implementation of a Hybrid Recommender System in Julia

piotr stepinski

Data Science Leader with extensive experience in AI and MLOps, currently serving as the CTO at Infinitii AI. He has a strong background in team leadership, product innovation, and building scalable data-driven solutions. Piotr is passionate about using AI to solve real-world problems, particularly in time-series analysis. He is an advocate for Agile methodologies and MLOps practices, and has spoken at conferences about these topics.

From Handwritten Notes to Smart Knowledge: Build Local AI Agents with Python