
Aarti Jha is a Senior Data Scientist at Red Hat, where she develops AI-driven solutions to streamline internal processes and reduce operational costs. She brings over 6.5 years of experience in building and deploying data science and machine learning solutions across industry domains. She has been an active part of the PyData community and presented at PyData NYC 2024 and PyData Amsterdam 2025.
- Brains Behind AI Agents: Hands-On with LLMs and Tools

GitHub: https://github.com/Schefflera-Arboricola
open-source work blogs - https://github.com/Schefflera-Arboricola/blogs
LinkedIn: https://www.linkedin.com/in/aditi-juneja-940838204
- Understanding API Dispatching in Scientific Python Ecosystem

Allison Ding is a developer advocate for GPU-accelerated AI APIs, libraries, and tools at NVIDIA, with a specialization in large language models (LLMs) and advanced data science techniques. She brings over nine years of hands-on experience as a data scientist, focusing on managing and delivering end-to-end data science solutions. Her academic background includes a strong emphasis on natural language processing (NLP) and generative AI. Allison holds a master’s degree in Applied Statistics from Cornell University and a master’s degree in Computer Science from San Francisco Bay University.
- Scaling Large-Scale Interactive Data Visualization with Accelerated Computing

Software Engineer at Databricks. Apache Spark Committer.
- Polars on Spark: Unlocking Performance with Arrow Python UDFs

I lead CUDA Python Product Management, working to make CUDA a Python native.
I received my Ph.D. from the University of Chicago in 2010, where Ibuilt domain-specific languages to generate high-performance code for physics simulations with the PETSc and FEniCS projects. After spending a brief time as a research professor at the University of Texas and Texas Advanced Computing Center, I have been a serial startup executive, including a founding team member of Anaconda.
I am a leader in the Python open data science community (PyData). A contributor to Python's scientific computing stack since 2006, I am most notably a co-creator of the popular Dask distributed computing framework, the Conda package manager, and the SymPy symbolic computing library. I was a founder of the NumFOCUS foundation. At NumFOCUS, I served as the president and director, leading the development of programs supporting open-source codes such as Pandas, NumPy, and Jupyter.
- Building Inference Workflows with Tile Languages

Anindya is a Machine Learning Platform Engineer at Zoox, building scalable infrastructure for distributed training of LLMs and VLMs. Previously at Lyft, he led the development of Spark Notebooks on Kubernetes to accelerate ML prototyping. He has worked across LLMOps, MLOps, and data infrastructure, and has built systems for training, serving, and monitoring ML models at scale using Kubernetes, Spark, and modern ML tooling.
- From Notebook to Cloud at Lightspeed: Accelerating ML Development with Ray
- Scaling Image Captioning Workflows with Ray Data, Ray Data LLM and vLLM

Avik is a seasoned machine learning engineer and data scientist who is passionate about developing tech that enhances people's lives. With deep expertise in scientific Python and a proven track record of building impactful ML solutions, he focuses on creating systems that address real-world challenges and improve people's lives.
- Beyond Just Prediction: Causal Thinking in Machine Learning

Aziza is an Applied Scientist at Oracle (OCI AI) with more than 4 years of work experience with AI/ML/NLP technologies. Previously she worked in LLM evaluation and content moderation in AI safety at Microsoft’s Responsible & OpenAI research team. She is a graduate of a master of science in Artificial Intelligence from Northwestern University. Throughout her time at Northwestern, she worked as a ML Research Associate at Technological for Inclusive Learning and Teaching Lab (tiilt) in building multimodal conversation analysis applications called Blinc. She was a Data Science for Social Good Fellow at University of Washington’s eScience Institute during the summer of 2022. Aziza is interested in developing tools and methods that embed human-like reasoning capabilities into AI systems (particularly generative AI) and applying these technologies to socially-driven tasks that enhance human well-being. Once she is done coding, she is either training for her next marathon race or hiking somewhere around PNW.
- Are Your Fine-Tuned Models Reliable? Evaluating Prompt Robustness and Alignment

Hi, I’m Bernardo. I earned my PhD at Duke University, where I studied the economics of innovation. That work drew me into the practical challenges of data—how to make pipelines reliable, how to integrate validation naturally, and more recently, how these tools can be combined with AI.
- Know Your Data(Frame) with Paguro: Declarative and Composable Validation and Metadata using Polars

Carl Kadie leads the FaST-LMM open-source Python project for genomics. He also contributes to other Python and Rust projects, including a visualizer for the Turning Machine bbchallenge.org website. Previously, Carl was a Principal Applied Scientist at Microsoft and Microsoft Research, where he worked in machine learning, statistics, and genomics, with publications in Science and Nature.
(On the side, he writes fun articles about Python, Rust, and scientific programming on Medium.)
- Explore Solvable and Unsolvable Equations with SymPy

My name is Carlos Garcia Jurado Suarez, and I’m a Software and Machine Learning Engineering Consultant helping research organizations.
I have over 25 years of experience as an engineer, applied scientist, and manager at organizations of all sizes: Big Tech (Microsoft Research, Meta), early and growth stage startups and academia. My expertise and passion are in Machine Learning and Scientific Computing, and in particular bridging the research and engineering worlds.
I hold master's degrees in Computer Science and in Applied Mathematics, both from the University of Washington, as well as a bachelor's degree in Physics from ITESM, in Monterrey, Mexico.
- Wrangling Internet-scale Image Datasets

Catherine Nelson is an experienced data scientist and ML engineer, and the author of two O'Reilly books: Software Engineering for Data Scientists (2024) and Building Machine Learning Pipelines (2020). Previously, she was a Principal Data Scientist at SAP Concur, where she deployed NLP models to production and created innovative features including ML-powered carbon emissions analytics. She is currently consulting for startups on AI evaluation and developer relations. Catherine holds a PhD in Geophysics from Durham University and a Masters in Earth Sciences from Oxford University.
Robert Masson is a Senior Principal Data Scientist at Atlassian using data to inform strategic decisions at the company. He previously worked 11 years as a data scientist at Meta and 3 years as a quant at a hedge fund. Robert has a PhD in Mathematics from University of Chicago.
- Going From Notebooks to Production Code

R. Conner Howell is a software engineer at Eventual, Inc. where he works on Daft SQL. Prior to Eventual, he worked at Amazon on AWS CloudWatch Logs Insights and Amazon PartiQL – an open source SQL dialect for semi-structured data. His work interests include SQL, PL, and DBMS architectures. He is based in Seattle and enjoys cycling, climbing, and skiing the many mountains of the Pacific Northwest & BC.
- Why Models Break Your Pipelines (and How to Make Them First-Class Citizens)

Lecturer at the University of British Columbia and Data Science Educator at Posit, PBC
- LLMs, Chatbots, and Dashboards: Visualize and Analyze Your Data with Natural Language

I am CEO and co-founder of Expanso, and the Bacalhau Project helping, deploying and organizing our community building the next generation of the Internet.
Previously, I was co-director of Research Development at Protocol Labs, led Open Source Machine Learning Strategy at Azure, product management for Kubernetes on behalf of Google, launched Google Kubernetes Engine, and co-founded the Kubeflow project and the SAME project. I have also worked at Microsoft, Amazon and Chef and co-founded three startups.
When not spending too much time in service of electrons, I can be found on a mountain (on skis), traveling the world (via restaurants) or participating in kid activities, of which there are a lot more than I remember than when I was that age.
- Taming the Data Tsunami: An Open-Source Playbook to Get Ready for ML

Devin Petersohn is a Software Engineer at Snowflake, focusing on dataframes and distributed systems. Prior to working at Snowflake, Devin did a PhD at UC Berkeley, where he created a dataframe project called Modin, and wrote his thesis on dataframes. Devin is passionate about making complex distributed systems more accessible, and has contributed to multiple open source projects.
- We don't dataframe shame: A love letter to dataframes

Elliot Marx, Chalk co-founder, started his career at Affirm, where he built the early risk and credit data infrastructure system (the inspiration for Chalk). He then co-founded Haven Money, which Credit Karma acquired to power its banking products. He holds a B.S. and M.S. in Computer Science from Stanford University.
- Real-time ML: Accelerating Python for (< 5ms) inference at scale

I am a software engineer living in Seattle, with extensive experience at Amazon working on devices and cloud computing. At Oracle, I worked on developer tooling and novel compiler research for Java. Currently, I apply my classical computing and compiler expertise to quantum computing infrastructure challenges at Q-CTRL, where I oversee the technical direction of the quantum computing teams.
- Subgraph Isomorphism at Scale with data science tools
Fabiana Clemente is an entrepreneur and startup founder with a background in data science and AI. She has led the development of solutions that improve data quality and leverage synthetic data and generative AI to accelerate innovation. A current maintainer and active contributor to the open-source project ydata-profiling, one of the most widely adopted open-source EDA technologies, Fabiana is also an advocate for privacy-preserving approaches to data and AI.
She has authored research, received industry awards, and frequently speaks at international conferences. Her work bridges data engineering, machine learning, and responsible AI, with a focus on making advanced workflows, from document intelligence to multi-agent systems, more practical and reliable.
- Synthetic Data for LLMs and Multi-Agent Document Workflows
We are FTC 18225 High Definition, a 5x worlds qualifier robotics team that participates in the FIRST Tech Challenge. Since our founding in 2020, we've been focused on bringing STEM and robotics to as many communities as possible through various robotics clubs and STEM advocacy efforts. We are excited to host this workshop at PyData Seattle!
- FTC 18225 High Definition: Programming in Robotics Workshop

Ivan Perez Avellaneda is a researcher specializing in nonlinear systems, currently serving as a business analyst at Monaghan Medical Corporation. During his doctoral studies, he focused on data-driven reachability of nonlinear systems, a field with wide-ranging applications across various scientific and engineering domains, including economics. He brings extensive experience co-teaching in higher education of numerous mathematics-related courses.
He holds a Ph.D. in electrical engineering (2023) from the University of Vermont in the US, an M.Sc in economics (2018), and a B.Sc in mathematics (2016), both obtained from the Pontifical Catholic University of Peru. Alongside his academic pursuits, he has working experience in the education, financial, and business sectors, where he leveraged his skills in data science.
In academia, his interests are vast, but currently, his focus is on specific branches of optimization such as nonconvex, constrained, calculus of variation, optimal control, and the applications of these.
- The Problem of Address Matching: a Journey through NLP and AI

Jack Ye is a software engineer at LanceDB. He is a PMC member of Apache Iceberg and contributor to various open source projects in the data infra domain such as Apache Spark and Trino. Before joining LanceDB, Jack was a tech lead at AWS for products including SageMaker Lakehouse, S3 Tables, EMR and Athena integration with Iceberg and Delta Lake.
- Supercharging Multimodal Feature Engineering with Lance and Ray

I'm a software engineer with an interest in using AI to build better systems for data processing. Previously, I worked on ML training and inference systems at MosaicML and Databricks. Now at Sphinx, I'm working on creating AI copilots for data scientists.
- Accelerating Data Science Workflows with AI

Dr. Jim Dowling is the CEO and a co-founder of Hopsworks. He has previously worked at MySQL and as an Associate Prof at KTH Stockholm. Jim organizes the annual feature store summit and is a co-organizer of PyData Stockholm. Jim has written a book for O'Reilly called "Building ML systems with a feature store: batch, real-time, and LLM systems".
- Real-TIme Context Engineering for Agents

Founder/CTO of Connoiter, producing liberally licensed open source DataMap tooling and driving the effort to have a widely useful DataMap data schema in order to promote interoperability and reduce bit rot.
- DIY DataMaps of embedding vectors via open source tooling

Justin started his Software Engineering career as a Web Development Boot Camp Instructor where he developed a passion for exciting others with new concepts and empowering individuals with the tools needed to excel in their own right. As an Advocate at Redis, Justin created numerous videos breaking down Data Structures into easy-to-understand, relatable examples with real-world use cases. Now at Elastic, he has expanded into the realm of enhanced search, monitoring, and observability capabilities.
In his spare time, Justin enjoys hiking around the Pacific Northwest, building hobby electronics, and collecting vintage music synthesizers. His love of hardware and software has led him into a deep exploration of IoT for practical applications as well as performance art!
- There and back again... by ferry or I-5?

I'm a software engineer working on model optimization techniques in the Keras team at Google. I spend my time writing code in OSS, publishing new issues of my newsletter, or making YouTube videos!
- Practical Quantization in Keras: Running Large Models on Small Devices

Khuyen Tran transforms how data scientists learn and work. She is the author of Production-Ready Data Science: From Prototyping to Production with Python, a comprehensive guide that helps data professionals bridge the gap between experimentation and deployment.
As founder of CodeCut, she publishes daily Python tips in her newsletter that reach over 10,000 views per month and has built a community of 110,000 LinkedIn followers.
Previously an MLOps Engineer and Senior Data Engineer at Accenture, she built enterprise data solutions for clients worldwide.
- Retail Demand Forecasting Made Simple with MLForecast

At Schemantic.io and Storytellers.ai, I oversee all aspects of data science, product, and engineering with more than 40 patent claims underyling our tech. An almost decade long analytics veteran of Amazon and Expedia, I have led dozens of leaders across applied science, economics, analytics, data architecture, instrumentation, customer segmentation, customer retention, marketing operations and impact measurement at global scale.
- From Metadata to Meaning: Deterministic Foundations for Analytics and AI
- Accelerating Data Science Workflows with AI

Machine Learning Engineer with experience training billion-parameter generative models and building
high-throughput data pipelines across 1000+ GPUs. Specializes in scalable PyTorch training, structured
dataset curation, and distributed infra for large-scale multimodal systems.
- Wrangling Internet-scale Image Datasets

ML Engineer at Walmart
- From Research Topic to Podcast: Building Simple Agent-Based Workflows

Noor Aftab is the Global Program Lead at Amazon Web Services (AWS), where she drives strategic programs for Amazon S3, supporting some of the world’s most complex data, AI, and analytics workloads. With a foundation in software engineering and data science, she brings over a decade of experience building and scaling cloud-native solutions, AI/ML systems, and developer-focused programs.
She serves as Vice President of the Society of Women Engineers (SWE) Pacific Northwest section, championing technical leadership and mentoring initiatives across engineering communities. Noor is also Chair of the NumFOCUS Code of Conduct Working Group and User Group Leader for IBM Women in AI, where she fosters inclusive, resilient communities across 300+ open-source projects.
A frequent keynote speaker, Noor has presented at PyData Global, SciPy, ODSC, TEDx, IEEE, and 13+ global venues, delivering talks that connect technical depth with real-world adoption of AI and cloud. She has authored and led initiatives such as the IEEE Hour of Power AI training program, empowering engineers and professionals with practical AI skills.
Her contributions to technology and leadership have been recognized with awards, including the Australia Alumni Excellence Award and Asia Pacific HRM Congress Award, with media features in the BBC, Martha’s Vineyard Times, and Hindustan Times.
GitHub: aftabn81
| Website: www.nooraftab.com
- The Missing 78%: What We Learned When Our Community Doubled Overnight

Ojas A. Ramwala is a final-year Ph.D. candidate at the University of Washington, Seattle, in the Department of Biomedical Informatics and Medical Education, School of Medicine. His research focuses on enhancing the clinical translation of mammography-based deep learning algorithms for breast cancer screening. His work aims to explore how to validate the generalizability of AI models in large and diverse cohorts, establish explainability methods faithful to the AI model architecture to interpret algorithm predictions, and develop robust deep learning algorithms to predict challenging clinical outcomes.
As an inquisitive research enthusiast, his interests include developing and applying Artificial Intelligence and Deep Learning techniques for Biomedical Signal and Image Processing, Bioinformatics, and Genomics. He spent a year at New York University, studying Bioinformatics, where he pursued research at the NYU Center for Genomics and Systems Biology
Previously, Ojas was at the National Institute of Technology - Surat, India, in the Electronics Engineering Department. He has been fortunate to work as a Research Intern at the Council of Scientific and Industrial Research (CSIR-CSIO), the Indian Space Research Organization (ISRO-IIRS), and the Indian Institute of Science (IISc).
- Explainable AI for Biomedical Image Processing

Dear Program Committee,
I am currently a Principal Data Scientist at AppOrchid, where I lead projects at the intersection of machine learning, econometrics, and applied research, with a strong focus on interpretable and trustworthy AI. Over the past 15+ years, I have built a career bridging industry and academia, delivering data-driven solutions at organizations such as FleetOps, Convoy, and ServiceNow (ElementAI). My academic contributions include 2,000+ citations and multiple peer-reviewed publications (Google Scholar profile
).
As an Associate Professor, I taught in the Mathematics, Computer Science, and Business departments, designing and delivering courses in econometrics, statistical inference, and operational research. I also founded the Laboratory of Machine Learning in Finance and Organizations, mentoring more than 30 students and researchers on projects applying ML to finance, business, and social impact.
Beyond research and teaching, I am an experienced speaker and educator, known for communicating complex ideas in clear and engaging ways. Across conferences, lectures, and industry events, I have consistently emphasized explainability, transparency, and practical impact—principles that directly align with the growing demand for trustworthy AI.
With the rise of regulatory frameworks such as the U.S. AI Bill of Rights (2022) and the NIST AI Risk Management Framework, the need for interpretable models like Generalized Additive Models (GAMs) has never been greater. My session will demonstrate how GAMs provide a rare balance of performance, interpretability, and compliance, supported by real-world case studies and hands-on examples in Python.
I believe my background uniquely positions me to deliver a session that is both technically rigorous and directly relevant to today’s regulatory, business, and academic landscapes.
Sincerely,
Pedro Henrique Melo Albuquerque
- Generalized Additive Models: Explainability Strikes Back

👋 Hi everyone! I’m Rajesh, a Software Engineer based in Tempe, Arizona 🌵 with 7.5+ years of experience. Currently at Jenius Bank, I’ve been building AI/ML solutions for finance clients 💳🤖 over the past 3 years.
Always excited to chat about AI engineering and where the future of AI is headed 🚀✨. Let’s connect on LinkedIn! 👉 linkedin.com/in/rajeshsk
- Securing Retrieval-Augmented Generation: How to Defend Vector Databases Against 2025 Threats

Ramesh Oswal is a Senior Motion Planning Engineer at Aurora, with experience from Luminar and Noble.AI. He has expertise in AI/ML for Autonomous Systems and Education. He has also served as a review committee member for NeurIPS 2024, CNCF 2024, and CNCF 2023.
- Building Bazel Packages for AI/ML: SciPy, PyTorch, and Beyond

I’m currently a full-stack machine learning engineer at Walmart E-commerce, where I get to tackle exciting challenges in the world of online retail. Before that, I was a data scientist at Bank of America, building real-time fraud detection models using deep neural networks and big data – talk about high stakes!
My research interests lie in the fascinating areas of graph embedding, neural architecture search, and fast optimization methods for neural networks. I love pushing the boundaries of what’s possible with AI.
But my passion for technology extends beyond my day job. I’m also deeply invested in two side projects:
AI-Powered Vision for IoT: I’m exploring the potential of NVidia Jetson Nano to create innovative machine learning vision applications for the Internet of Things.
ML Design Patterns: I’m developing reusable design patterns to solve common machine learning problems, making AI development more efficient and accessible.
And when I need a break from the digital world, I head to my garden. I’m an avid grower of Cayenne peppers – the hotter, the better!
My journey to AI was paved with diverse experiences. Earlier in my career, I worked on NLP-based automated evaluation of text data, gaining valuable insights into the power of language processing. I hold a master’s degree in computer science from North Carolina State University – Raleigh (graduated in Spring 2016) and a bachelor’s degree in electronics and communication engineering.
- From Research Topic to Podcast: Building Simple Agent-Based Workflows

As a Data and Applied Scientist at Microsoft with 7 years of experience spanning multiple geographies, I specialize in harnessing the power of AI to transform products and user experiences. My work ranges from developing on device AI models to implementing large language models that revolutionize how data science is practiced at scale. With a Master's degree in Computer Science and Artificial Intelligence from the University of Massachusetts Amherst, I bring both academic rigor and practical expertise to every challenge, consistently pushing the boundaries of what AI can achieve.
- Unplugged Intelligence: Building LLM-Powered Apps That Run Offline
- Going From Notebooks to Production Code

Roman Lutz is a Responsible AI Engineer on Microsoft's AI Red Team, specializing in the safety and security of generative AI and open source software. He is a maintainer of PyRIT, Microsoft’s open-source AI red teaming toolkit, and has helped shape projects like Fairlearn and the Responsible AI Dashboard. Roman’s work bridges technical rigor with a commitment to transparency and accountability, empowering practitioners to build more robust and ethical AI systems. He shares his projects and insights at romanlutz.github.io.
- Red Teaming AI: Getting Started with PyRIT for Safer Generative AI Systems

Sarah has spent most of her career developing technology in the lab, from virtual reality hardware to satellites. She got her PhD in Physics by starting plasma fires with lasers, Python, and Jupyter Notebooks. She has also written tech books for folks of all ages, including ABCs of Engineering and Learn Quantum Computing with Python and Q#. As a Cloud Developer Advocate for Python at Microsoft and a Python Software Foundation Fellow, she finds all kinds of new ways to build and break OSS tools for data science and machine learning. When not at her split ergo keyboard, she loves boating in the Seattle area, laser cutting everything, and playing with her German Shepard, Chewie.
- There's no place like home: using AI agents in Jupyter notebooks

I'm currently focused on building a frictionless Machine Learning Platform at Outerbounds, where our mission is to let data scientists and ML engineers stay focused on AI/ML development—while we manage the infrastructure that powers it.
My background is in large-scale distributed systems, with experience spanning cloud infrastructure and identity/authorization systems. I've worked on infrastructure teams at Oracle Cloud and Outerbounds, and on IAM/authorization platforms at Atlassian and Databricks.
At Atlassian, I was part of the team that built a CQRS-based permissions system deployed across six AWS regions, handling 100K+ read requests with sub-3ms P99 latencies.
At Databricks, I founded and led a 6-engineer team focused on authorization. We transitioned the platform from a monolithic client-based model to a service-oriented architecture, integrating with ~35 internal services and achieving P99 latencies under 1 second for over 10K requests per second.
Outside of engineering, I enjoy spending time with my daughter, and I'm always up for a game of cricket or table tennis.
- Optimizing AI/ML Workloads: Resource Management and Cost Attribution

Seb is a Lead AI Engineer with a Master’s in Information Systems, originally from Germany and now US‑based. After beginning a PhD, he moved into consulting and served as Chief Product Officer at a major Austrian bank. He later pursued NLP research with MIT, co‑founded and exited a startup, and built AI/NLP systems in production. He has taught 20+ academic courses and published seven peer‑reviewed articles, known for translating complex concepts into practical solutions that bridge technical rigor with stakeholder needs.
- Evaluation is all you need
- Polars on Spark: Unlocking Performance with Arrow Python UDFs

Stephen Cheng is a software engineer at Parakeet Health, an AI powered voice agent startup that serves medical providers, where he works on infrastructure and backend. He has also worked at Uber and Microsoft.
- Scaling Background Noise Filtration for AI Voice Agents

Principle Software Engineer @ NVIDIA.
- Parallel PyTorch Inference with Python Free-Threading

Weston is an open source software engineer at LanceDB. He is on the PMC for Apache Arrow and Substrait and has spent an unhealthy amount of time thinking about how best to read data from cloud storage. Recently he has been helping develop the Lance file and table formats and studying how random access, multimodal data, and search can be integrated into the modern data lake.
- Data Loading for Data Engineers