PyData Virginia 2025

Aaron Baker

Data Scientist and Statistician working at CapTech Consulting with 8 years experience in business. A coach, mentor, teacher, and expert in the field of data science and a diverse love of learning across the fields of data science, psychology, business, and foreign literature.

  • Panel: Principles for Effective and Successful Data Scientists
Alec Gosse

Alec is a Senior Director of Data Science at S&P Global Market Intelligence leading Ai efforts toward internal productivity.

  • AI Ready Data
Alex Arsenovic

Alex has worked as a data scientist, library builder, and math enthusiast with over 12 years of experience. He holds a B.S. and Ph.D. in Electrical Engineering from the University of Virginia (2007, 2012), where he specialized in microwave systems and applied mathematics. He founded Eight Ten Labs (810lab.com) in 2016, and has developed and maintained two widely adopted open-source Python libraries, scikit-rf and clifford, has authored over 25 scientific papers, and holds a U.S. patent (No. 10459018) in electronic measurement systems.

  • What is Geometric Algebra and can it help me?
Andrea Hobby
  • Responsible AI with SciPy
Astha Puri

Astha is a Senior Data Scientist at CVS Health, where she leads the design of recommendation engines for digital platforms, helping customers discover the right products and enabling patients to access the appropriate health services and support. She specializes in home screen personalization, leveraging data-driven insights to enhance user experiences. With a strong background in the tech industry, she is now applying her expertise to transform and innovate within the healthcare sector.

  • Build Your Own Data Science AI Agents
Benjamin Bengfort

Dr. Benjamin Bengfort is the co-founder and CEO of Rotational Labs, where he orchestrates the integration of innovative machine learning techniques with advanced distributed computing systems. A seasoned expert in systems engineering, programming, and data science, he has a proven record of developing AI-driven solutions that support globally distributed data architectures and address the complex challenges of multi-region organizations. Under his leadership, Rotational has focused on not just the implementation but also the responsibility of participating in an AI driven economy; a believer in open source, Dr. Bengfort pays special attention to the ethics and outcomes of AI, ensuring humans are at the center of our solutions. He is the co-author of Applied Text Analysis with Python (2018, O’Reilly) and Data Analytics with Hadoop (2016, O’Reilly). Dr. Bengfort earned his Ph.D. from the University of Maryland focusing on planetary scale distributed systems.

  • Practical Multi Armed Bandits
Brian Richards

Brian Richards is a Senior Data Scientist at HelioCampus and works with data across the higher education student lifecycle to help colleges and universities better understand their students and support them through graduation. Brian also has an interest in exploring model evaluation techniques and helping end users better understand how their models work.

  • Machine Learning Pipelines in Higher Education: Lessons Learned Taking Models From Training to Production
Chris Fonnesbeck

Chris is a Principal Quantitative Analyst at PyMC Labs and an Adjoint Associate Professor at the Vanderbilt University Medical Center, with 20 years of experience as a data scientist in academia, industry, and government. He is interested in computational statistics, machine learning, Bayesian methods, and applied decision analysis. He hails from Vancouver, Canada and received his Ph.D. from the University of Georgia.​​

  • A Beginner's Guide to Variational Inference
Christopher N. Eichelberger

Chris has more than 30 years experience in the space, from analytics to senior management. He learned long ago that it is people skills, not his technical ability nor his fashion sense, that would help him make an impact. He remains glad that every day is a surprise.

  • Panel: Bridging the Gap: Collaborative Approaches to Data Science
Chuxin Liu

Chuxin Liu, PhD holds a PhD in economics and current works as a senior quantitative associate at JPMorgan. Chuxin has worked with leading companies in the field and has been an active member in the data communities. She is the chapter lead for AICamp and an ambassador for Women in Data Science (WiDS). Chuxin's teaching experience spans from institutions like City University of New York to public conferences including Pydata.

  • Build Your Own Data Science AI Agents
Cory Eicher

Cory Eicher is the founder of Eichcorp, a software consulting and implementation practice based in Charlottesville, Virginia... Developer/Mapper/Reader/Biker/Hiker/Skier/Soccer-er

  • Using Python to Unlock Insights from OpenStreetMap Data at Scale
Cynthia Ukawu

Cynthia is a geospatial software engineer with a passion for teaching and making technical concepts approachable. Currently working as a backend software engineer, she develops innovative geospatial solutions that solve real-world problems. Cynthia has a strong background in Python and data science, with experience mentoring students in data analytics at Springboard and teaching Python to beginners at Masterschool.

In addition to her professional work, Cynthia is an experienced public speaker. She’s presented at PyTexas and at Arlington Code-The-Curb on her “Park and Stride” project—a web app that helps commuters integrate walking into their daily routines. Her approachable teaching style combines hands-on learning with practical insights.

Outside of work, Cynthia is passionate about graph theory, computer vision, and geospatial data. She’s currently exploring the intersection of LiDAR technology and urban mobility. When she’s not coding or mentoring, Cynthia enjoys dancing Samba and blogging about ways beginners can break into tech on her website, cynscode.com.

  • From Pandas to PySpark
Dan Loehr

Dan Loehr earned his bachelor's in Computer Science from Cornell and a master's and PhD from Georgetown in Computational Linguistics. He 30 years experience leading large organizations in R&D and application of Machine Learning, AI, Natural Language & Speech Processing, and related fields. He has numerous publications and extensive experience teaching these topics at the graduate level. He's currently teaching a course on AI & Climate Change in Georgetown's Master of Science in Data Science and Analytics program

  • Addressing Climate Change with AI
David Der

Chief AI Officer, computer scientist

  • Real-Time Fitness Leaderboards with Open-Source Moose
  • Panel: Principles for Effective and Successful Data Scientists
Dmitry Petrov

Creator of open-source tool DVC. Ex-Data Scientist at Microsoft. PhD in Computer Science. Now co-founder of datachain.ai

  • Versioning Multimodal Data: Metadata & Beyond
Dr. Kimberly Deas

With over 10 years of real world data (RWD) experience in Informatics, Biostatistics, Data Science, and Epidemiology, and over 20 years as a Scientist, Dr. Kimberly Deas is a currently a Principal Analytics Research Scientist Consultant. Her work experiences and specializations include healthcare informatics, health disparities, chemical and cancer informatics, and computational toxicology. Dr. Deas is a passionate Data educator, teaching data science, healthcare analytics, and data visualization at the collegiate level primarily through coding webinars. In her spare time, Dr. Deas enjoys golf, crocheting, walking, and reading for leisure.

  • Data Viz in Python as a Tool to Study HIV Health Disparities
Dr. Michele Claibourn

As the Director of Equitable Analysis, Michele Claibourn leads the UVA Equity Center’s community-engaged data science work in support of a more equitable and just region. Michele works to connect the developing data expertise of UVA students to the community as well through her faculty appointment in the Batten School of Leadership and Public Policy, where she teaches courses on Imagining Equitable Policy and Public Interest Data: Ethics and Practice, and a courtesy appointment in the School of Data Science, where she helped launch a Community Data Fellows program.

  • Exploring Eviction Trends in Virginia
Greg Michaelson

Greg Michaelson is Cofounder and Chief Product Officer at Zerve, a young, stealthy startup that’s rethinking the data science development experience. Previously, Greg was an early joiner at DataRobot where he played many roles, including Chief Customer Officer. Prior to that, he worked as a data scientist in the financial sector after earning a PhD in applied statistics from the University of Alabama. In his spare time, Greg manufactures a line of flavored breakfast cereal toppings called Cerup. He lives in Spring Creek, Nevada with his wife, four children, and two Clumber Spaniels.

  • Saving Lives with Data Science: How data science shortened the COVID-19 pandemic by 2 months
Hamish Brookeman

Hamish Brookeman – VP – Enterprise Data Architecture, S&P Global Enterprise Data Organization, S&P Global

Hamish is responsible for Enterprise Data Architecture, which is responsible for the overall design of managed data structures including strategies for data implementation, acquisition and maintenance and evaluating data sources for adherence to quality standards and ease of integration. The specific role is to capture data requirements clearly, completely and correctly, and represent them in a formal and visual way through the data models. In addition, making sure that data integration is based on a common metadata framework and that the integrated data is presented to the business as valid information.

Hamish previously served in a similar role for S&P Global Market Intelligence. Hamish joined S&P Global in 2015 via the SNL Financial acquisition where he had served as Head of Data Architecture since 2006.

Hamish has 25+ years of experience in technology leadership, large abstract datasets and highly engineered information systems. He has extensive knowledge of Structured, Semi-Structured and Unstructured data strategies. Hamish attended Princeton University where he studied Economics and Politics.

  • AI Ready Data
John Berryman

John Berryman is the founder and principal consultant of Arcturus Labs, where he specializes in AI application development (Agency and RAG). As an early engineer on GitHub Copilot, John contributed to the development of its completions and chat functionalities, working at the forefront of AI-assisted coding tools. John is coauthor of "Prompt Engineering for LLMs" (O'Reilly).

Before his work on Copilot, John's focus was search technology. His diverse experience includes helping to develop next-generation search system for the US Patent Office, building search and recommendations for Eventbrite, and contributing to GitHub's code search infrastructure. John is also coauthor of Relevant Search (Manning), a book that distills his expertise in the field.

John's unique background, spanning both cutting-edge AI applications and foundational search technologies, positions him at the forefront of innovation in LLM applications and information retrieval.

  • Mastering LLMs: From Prompt Engineering to Agentic AI
Josh Fairchild

Josh brings a background in non-profit organizational leadership to data and analytics consulting. He is passionate about helping teams thrive by implementing best practices when it comes to change management, data governance, and process development. He is an alumnus of the University of Virginia, with a B.S. in Computer Engineering.

  • The Secret Sauce of Customer Satisfaction: Turning Data Pipelines into Data Products
Krishna Rekapalli

Krishna is a Senior Data Scientist at IBM's Watsonx.ai Solution Architecture Center of Excellence, specializing in designing and implementing enterprise-scale LLM-powered AI solutions and agentic workflows. With over 7 years of experience building machine learning applications, they bring extensive expertise in hybrid cloud architectures, geospatial data analysis, and artificial intelligence. At IBM, they work directly with clients to architect and deploy production-ready AI solutions, focusing on practical implementation challenges and scalable architectures.

  • Building Rich RAG Systems with Docling: Unlock Information from Tables, Images, and Complex Documents
Lane Rasberry

Lane Rasberry is Wikimedian-in-residence at the School of Data Science at the University of Virginia. His interests include popular science, consumer protection, civic engagement, access to health information, clinical research, the Open Movement, data science, LGBT history, and Wikimedia projects.

  • Introduction to Wikidata
Liam Agnew

Liam brings a background in multidisciplinary product R&D and project management to approach data engineering challenges with creative, yet structured, solutions. He has experience with Python, MongoDB, C++, Java, JavaScript, and mobile app development. He is an alumnus of the University of Virginia, with a B.S. in Mechanical Engineering and minor in Materials Science and Engineering.

  • The Secret Sauce of Customer Satisfaction: Turning Data Pipelines into Data Products
MacKenzye Leroy

MacKenzye Leroy is a Lead Data Scientist within S&P Global's newly formed MI Enterprise Technology & Internal Productivity Team, where he focuses on developing enterprise AI solutions to transform business operations. Working closely with stakeholders across Sales, Commercial, Legal, and Marketing, he implements AI-powered productivity solutions.

MacKenzye combines his M.S. in Data Science from the University of Virginia with his physics background to solve complex business challenges. His expertise spans artificial intelligence, machine learning, data pipeline development, anomaly detection, statistical analysis, and full-stack data science implementation - from initial concept through production deployment.

When not working with data, MacKenzye can be found exploring mountain trails by foot, bike, or snowboard, reading, or cheering on his beloved New York Mets.

  • Evaluating LLMs at S&P Global: Building a Robust Evaluation Framework for GenAI Productivity Tools
Manikandarajan Shanmugavel

With over a couple of decades of experience in Information Technology, I have worked on groundbreaking technologies like Cloud and Machine Learning and witnessed their impact on the business and society. I am currently working as an Associate Director in Software development at S&P Global, one of the leaders in Financial Services. I am leading a team that contributes to the AI initiatives of S&P Global. I also hold a Masters degree from UVA in Data Science

LinkedIN: https://www.linkedin.com/in/mani-shanmugavel/

Medium: https://medium.com/@manikrajan

  • Panel: Bridging the Gap: Collaborative Approaches to Data Science
Matt Litz

Matt Litz is a Data Science Engineer at BWX Technologies in Lynchburg, VA. He earned his Master's in Data Science from the University of Virginia in 2023. Primary research interests include computer vision and innovative approaches to implementing Large Language Models.

  • Tutorial on Image Classification using Scikit-Image, Scikit-learn, and PyTorch
Matt Topol

Hailing from the faraway land of Brentwood, NY and currently residing in the rolling hills of Connecticut, Matt Topol has always been passionate about software. After graduating from Brooklyn Polytechnic (now NYU-Poly), he joined FactSet Research Systems, Inc. in 2009 developing financial software. In the time since, Matt has worked in infrastructure and application development, has lead development teams, and architected large-scale distributed systems for processing analytics on financial data. Matt is a PMC member for the Apache Arrow project, frequently enhancing the Golang library among other enhancements and helping to grow the Arrow Community. Recently, Matt wrote the first and only book on Apache Arrow "In-Memory Analytics with Apache Arrow" and joined Voltron Data in order to work on the Apache Arrow libraries full time and grow the Arrow Golang community.

In his spare time, Matt likes to bash his head against a keyboard, develop/run delightfully demented games of fantasy for his victims--er--friends, and share his knowledge with anyone interested who'll listen to his rants.

  • Author Chat & Book Signing
  • Practical Applications of Apache Arrow
Mauricio Mathey

Mauricio is the leader of the Advanced Analytics Group at Asarco LLC, a subsidiary of Grupo Mexico, where he leverages AI/ML to drive improvements in costs, productivity, and safety. With over 7 years of experience across Latin America and the US, Mauricio has a proven track record in consulting and applying advanced analytics to solve complex business challenges. Prior to joining Asarco, he led commercial strategy analytics projects at EY-Parthenon. Mauricio has a Ms. in Data Science and an MBA from the University of Virginia and holds a Bs. in Industrial Engineering from the University of Lima.

  • Using Changepoint and Bayesian Analysis to Drive Safety Improvements in Mining
Michelle Rojas
  • Build Your Own Data Science AI Agents
Mike McCarty

Mike is a Senior Software Engineering Manager at NVIDIA working on RAPIDS where he manages teams working on RAPIDS Cloud and HPC deployments, build infrastructure and packaging, and PyData projects. He has also contributed to open source software projects in the PyData ecosystem such as Dask and Intake. He holds two bachelor’s degrees in computer science and physics, and has over 20 years of experience in software engineering and scientific computing in astronomy, computational sciences, data science, machine learning, and enterprise products.

  • Getting Started with RAPIDS: GPU-Accelerated Data Science for PyData Users
Nathan Day

I dance in vector space.

  • Maximizing Multimodal: Exploring the search frontier of text-to-image models to improve visual find-ability for creatives
Naty Clementi

Naty is a Senior Software Engineer at NVIDIA. She is a former academic with a Masters in Physics and PhD in Mechanical and Aerospace Engineering to her name. She is currently contributing to RAPIDS, but in the past has also contributed and maintained other open source projects such as Ibis and Dask. She is also an active member of PyLadies and an active volunteer and organizer of Women and Gender Expansive Coders DC meetups.

  • Getting Started with RAPIDS: GPU-Accelerated Data Science for PyData Users
Niharika Krishnan

Niharika is a Machine Learning Engineer in NYC, working at the intersection of Quant Finance and AI. As a PyLadies organizer and WiDS ambassador, she fosters a community of women in NYC to collaborate and grow in the field of AI.

  • Build Your Own Data Science AI Agents
Rajkumar Venkatesan

Rajkumar Venkatesan is the Ronald Trzcinski and John Tyler Professor of Business Administration, and the Co-Academic Director of the LaCross Institute for Ethical AI in Business at the Darden Business School at the University of Virginia. Raj has written about and taught Quantitative Digital Marketing to MBA and executive education students worldwide. His teaching experience and research at Darden translated into the books, Cutting Edge Marketing Analytics, published by Pearson Education in 2014 and AI Marketing Canvas in 2021. He has published extensively in the Journal of Marketing, Journal of Marketing Research, Marketing Science, Journal of Academy of Marketing Science, International Journal of Research in Marketing, Harvard Business Review, and California Management Review. He serves as an Associate Editor for the Journal of Academy of Marketing Science. He is a recipient of several awards including the Long-Term Impact in B2B Marketing from ISBM, and the Well Fargo Award for course materials development. More than 450,000 individuals have participated in his courses on Coursera. Venkatesan has consulted with large enterprises and startups in the technology, retailing, media, industrial goods, financial services, and life sciences industries. He has developed custom executive education programs and data analytics software for Capital One, CFA Institute, Dr. Reddy Labs, DFW Airports, Explore Learning, ExxonMobil, General Electric, General Dynamics, HBO, IBM, Johnson & Johnson, MAS Holdings, Navy Federal Credit Union, Pitney Bowes, Rosetta Stone, SAP, Teradata, State Farm, Tata Sons, and TEG Analytics.

  • Keynote: Building AI-First Organizations
Ralph Liu

Ralph is currently a software engineer at NVIDIA, working on GPU-accelerated graph libraries (cuGraph, nx-cugraph) as a part of RAPIDS.

  • Zero Code Change GPU-Powered Graph Analytics with NetworkX and cuGraph
Renee Teate

Renee Teate is the Senior Director of Data Science at higher ed tech company HelioCampus and author of SQL for Data Scientists (Wiley). Many people know her as the host of the Becoming a Data Scientist Podcast, or as "Data Science Renee" from BlueSky (previously becomingdatasci on Twitter).

Renee lives in Harrisonburg, VA, and is a graduate of JMU and UVA. She has worked with data her entire career, as a database designer, data analyst, data scientist, and director. She enjoys chatting with people looking to "break into" data careers, or looking to build their data science network.

  • Panel: Bridging the Gap: Collaborative Approaches to Data Science
  • Author Chat & Book Signing
  • Panel: Principles for Effective and Successful Data Scientists
Robert Shelton

Robert is an Applied AI Engineer at Redis, where he specializes in vector search and AI applications, supporting the development of the redisvl package and collaborating with a wide range of customers, from startups to enterprise organizations. His expertise spans diverse use cases, including financial chat applications, e-commerce recommendation systems, and more. Prior to Redis, Robert honed his skills as a Data Scientist and Full-Stack Engineer in the logistics industry, leading innovative projects that bridged software development and the complexities of physical goods movement.

When he's not diving into AI and data challenges, you can find Robert enjoying the great outdoors—most likely savoring some camp stove ramen along the Appalachian Trail in his native Virginia.

  • Blazing the AI Trail: Using LangGraph to Conquer the Oregon Trail
Robin Isadora Brown

Researcher

  • Introduction to Wikidata
Samantha Toet

Samantha grew up in Charlottesville and earned her bachelor's degree in social psychology research from UVA. She previously worked as a Solutions Engineer and Partnership Manager at RStudio, with a focus on open-source technology advocacy. She believes that everyone should be able to make informed, data-driven decisions regardless of their means, and is passionate about enabling her community with tools to support more equitable and accessible analytics. She currently serves as a Data Scientist at the Virginia Equity Center.

  • Exploring Eviction Trends in Virginia
Sihang Jiang

Sihang Jiang is a PhD candidate at University of Virginia in systems engineering, and his research interests include Bayesian machine learning, Markov Chain Monte Carlo, AI for health, and natural language processing.

  • Bayesian Risk Analysis For Large Multi-Modal Data
Siwen Liao

Siwen Liao is a second-year undergraduate student at the University of Virginia studying statistics and physics. Her academic interests focus on applying data science and quantitative methods to medicine and healthcare.

  • The Art of Brain Data in ASD Subjects: Celebrating Neurodiversity Through Aesthetic Data Visualization
Srijith Rajamohan

Dr. Srijith Rajamohan currently leads AI Research at Redis for building efficient and scalable retrieval systems with GenAI. Prior to this role, he has led the data science effort for Sage Copilot and also led the team that created and deployed domain-specific LLMs to address the deficiencies of off-the-shelf models for accounting. He also had stints at Databricks where he led the data science developer advocacy efforts and at Nerdwallet as a data scientist. Before making the switch to the tech sector, he spent about six years in academia as a computational scientist at Virginia Tech.

  • Fine tuning embeddings for semantic caching
Suhas Pai

Suhas Pai is a NLP researcher and co-founder/CTO at Hudson Labs, a Toronto based Y-combinator backed startup. He is the author of the book 'Designing Large Language Model Applications', published by O'Reilly Media. He has contributed to the development of several open-source LLMs, including being the co-lead of the Privacy working group at BigScience, as part of the BLOOM LLM project. Suhas is active in the ML community, being Chair of the TMLS (Toronto Machine Learning Summit) conference since 2021. He is also a frequent speaker at AI conferences worldwide, and hosts regular seminars discussing the latest research in the field of NLP.

  • Author Chat & Book Signing
  • Making the most of test-time compute in LLMs
Thomas Loeber

Thomas is a senior machine learning engineer at GoHealth, where he builds and productionizes GenAI models. Previously, he worked in consulting and at a technology startup, focusing on MLOps adoption. He originally came from the statistics and data science side, but has also worked in software and data engineering, searching for lessons from these more mature disciplines for how to create maintainable and scalable software systems. Now, Thomas is passionate about integrating these diverse insights to build robust ML systems.

  • Panel: Bridging the Gap: Collaborative Approaches to Data Science
Tyler Hutcherson

Tyler leads the Applied AI Engineering group at Redis, working hands-on with customers and partners on real-time GenAI and ML workloads. Previously, Tyler led ML Engineering at a early-stage eCommerce startups building novel search & recommendation systems graduated from the University of Virginia with a BS in Physics and MS in Data Science. His passions involve MLOPs system design and working with LLMs to solve actual problems. He also enjoys distilling myths and building bridges in the tech community through knowledge and resource sharing.

Tyler and his wife Cynthia reside in Richmond, VA where they enjoy hosting friends, family, and soaking in the city's history, landmarks, nature, food and creative scene.

  • Fine tuning embeddings for semantic caching
Vivek Dhand

Vivek Dhand uses his background in pure mathematics to address complex real-world problems. He has led and contributed to several applied research projects involving data fusion, computer vision, and natural language processing. He strives to develop robust and explainable systems with transparency and accountability, in order to minimize bias and protect individual privacy.

Vivek received his Ph.D. in mathematics from Northwestern University. His research interests include representation theory, category theory, algebraic combinatorics, and visualizations of mathematical structures.

  • Visualization of higher-dimensional feature spaces during model training
Waris Gill

I am a final-year PhD student in the Computer Science department at Virginia Tech. Currently, I am interning at Redis as a Machine Learning Engineer.

  • Fine tuning embeddings for semantic caching
Will Angel

Will Angel is a Data Solution Architect at Excella, leading data teams to help our clients solve data problems. Will is the author of Virtual Power: The Future of Energy Flexibility, an organizer for the Data Visualization and Data Engineers DC Meetups, and the executive director at Data Community DC, a 501c3 nonprofit dedicated to data education in the national capital area. In his free time, Will enjoys wildlife photography, gardening, reading, cooking, art, DIY electronics, and traveling.

  • Data wrangling with DuckDB
Will Ayd

Will Ayd is the author of the Pandas Cookbook, Third Edition, and has served as a maintainer of the pandas project since 2018. Will is also a Committer to the Apache Arrow project, and has helped improve countless more open source data libraries.

In his day job, Will helps clients in the Retail and Apparel spaces optimize cloud data platforms in AWS and GCP, while also providing strategy and training around the use of open source technology in enterprise settings.

  • Author Chat & Book Signing
  • Practical Applications of Apache Arrow