PyData Vermont 2025

To see our schedule with full functionality, like timezone conversion and personal scheduling, please enable JavaScript and go here.

Tuesday, Oct. 21, 2025

Wednesday, Oct. 22, 2025

08:30

60min

Registration/Breakfast Snacks

UVM Alumni House Silver Pavilon

09:30

60min

Trust in Data: Community Partnership for Meaningful Change

Abby Crocker

Data are only as powerful as the trust they carry and the communities they serve. In this talk, I’ll explore how building authentic partnerships transforms not just what data we collect, but how they are interpreted and acted upon. At a time when data shape decisions across every sector, it has never been more important for us to use data responsibly. When grounded in trust and collaboration, data have the power to illuminate our most complex challenges and break down silos to drive meaningful change.

UVM Alumni House Silver Pavilon

10:30

45min

Open Source Vermont Data Platform: Access, Analysis, and Visualization

Fitz Koch

This session explores early design and back-end development of the Vermont Data Collaborative, an open source data dashboard for the state. Participants will engage in a collaborative discussion on incorporating feedback from community partners, with an emphasis on design considerations, data access, prototyping with the final project in mind, potential pitfalls. The aim is to both provide a practical framework for your open source data project, and improve that framework itself.

UVM Alumni House Silver Pavilon

11:15

45min

A Practical Introduction to Geospatial Python

Stephen Smith

Do you know where your data is? It's time to unlock the real power of location with Python. This talk is your practical guide to the open geospatial ecosystem, designed for data practitioners who are ready to turn location data into meaningful insights.

We'll cover the end-to-end workflow: from acquiring open data to performing powerful spatial joins and creating compelling maps. You will learn to use core libraries like GeoPandas and Shapely, and finally demystify one of the trickiest parts of geospatial work: coordinate reference systems.

This session is for data scientists, analysts, and engineers familiar with pandas who want to add a powerful new dimension to their work. You’ll leave with a clear, actionable roadmap for integrating geospatial analysis into your projects.

UVM Alumni House Silver Pavilon

12:00

60min

Lunch

UVM Alumni House Silver Pavilon

13:00

45min

Context is all you need: FUNdamental linguistics for NLP

Julia Witte Zimmerman, Ashley

We will discuss fundamental linguistics and data science concepts that underpin the ability to extract signal from text. This talk brings theoretical context to general data science and NLP approaches. Topics will include the linguistic grounding of large language models (LLMs), basic NLP methods, and common pitfalls in textual analysis. We will also present some tools developed by our lab that can act as powerful lenses for textual data. Some examples we will use to approach these topics include: word frequency and distributions, Zipf’s law, the Distributional Hypothesis, allotaxonometry, sentiment, time series, and scale.

Takeaways from this talk will be theoretical background and tools that support a holistic approach to extracting signal from text, empowering attendees to engage critically with NLP applications in the wild and to deploy NLP approaches responsibly and creatively.

UVM Alumni House Silver Pavilon

13:45

15min

Community Breakout Session

UVM Alumni House Silver Pavilon

14:00

45min

From Chaos to Confidence: Solving Python's Environment Reproducibility Crisis

Dawn Gibson Wages

45 min talk going through the decision tree for picking Python environment tools / package managers geared toward Scientific computing and context on why Python environment management is so difficult -- spoiler: it's why Python is so popular -- for its flexibility and extensibility.

UVM Alumni House Silver Pavilon

14:45

15min

Break

UVM Alumni House Silver Pavilon

15:00

45min

Apache Iceberg at BETA Technologies

Jayce Slesar

Apache Iceberg is an open table format that comes with a lot of benefits and interoperability. Learn about what it is, how it works, and how it is used at BETA Technologies to empower analytics at scale.

UVM Alumni House Silver Pavilon

15:45

90min

The Art of Data: Hand-crafted, Human-centered Data Visualization

Ben Cooley, Kendall Fortney

This hands-on workshop explores "small data" through the creation of physical and analog data visualizations. Inspired by projects like Giorgia Lupi's "Dear Data", participants will form small groups and collaborate to represent and humanize data from one of five pre-selected datasets. We will explore how these physical representations can foster community collaboration and new ways of seeing, hearing and sharing data.

UVM Alumni House Silver Pavilon

17:15

15min

Closing Remarks

UVM Alumni House Silver Pavilon

08:30

60min

Registration/Breakfast Snacks

UVM Alumni House Silver Pavilon

09:30

45min

Cleaning Messy Data at Scale: APIs, LLMs, and Custom NLP Pipelines

Thibault Dody

Messy and inconsistent data is the curse of any analytic or modeling workflow. This talk uses the example of working with address data and demonstrates how natural-language-based approaches can be applied to clean and normalize addresses at scale. The presentation will showcase the results of several methods, ranging from naive regular expression rules to 3rd-party APIs, open-source address parsing, scalable LLM embeddings with vector search, and custom text embeddings.

Attendees will leave knowing when to choose each method and how to balance cost, speed, and precision.

UVM Alumni House Silver Pavilon

10:30

90min

GPU-Accelerated Data Science for PyData Users

Naty Clementi, Mike McCarty

In this introductory hands-on tutorial, participants will learn how to accelerate their data workflows with RAPIDS, an open-source suite of libraries designed to leverage the power of NVIDIA GPUs for end-to-end data pipelines. Using familiar PyData APIs like cuDF (GPU-accelerated pandas) and cuML (GPU-accelerated machine learning), attendees will explore how to seamlessly integrate these tools into their existing workflows with minimal code changes, achieving significant speedups in tasks such as data processing and model training.

UVM Alumni House Silver Pavilon

12:00

60min

Lunch

UVM Alumni House Silver Pavilon

13:00

90min

Complex Data Ingestion with Open Source AI

Mingxuan Zhao

If you have worked with AI in any capacity, you'll know that AI is only as valuable as the data in can leverage. Data is the cornerstone of AI, and developers need better ways to transform complex documents into structured data ready for model training and inference.

In this session we will learn how to turn common, real-world documents and scans into structured data for search and RAG. In this 90-minute, code-along workshop, you’ll learn all about Docling, an open-source toolkit for advanced document conversion, allowing you to leverage your data more effectively into AI workflows. We’ll complete three labs; Conversion, Chunking, and RAG, and you’ll leave with runnable notebooks from a public GitHub repo.

Audience: Python practitioners shipping document-centric apps.

Prereqs: basic Python/Jupyter.

UVM Alumni House Silver Pavilon

14:30

90min

Birds of a Feather

UVM Alumni House Silver Pavilon

16:00

45min

MCP basics with Conda and Claude

Daina Bouquin

Learn to extend Claude's capabilities by building a Model Context Protocol (MCP) server that connects Claude Desktop to external APIs. This hands-on tutorial guides participants through creating a simple MCP server using Python and conda, demonstrating how to enable Claude to access real-time data from the New York Times Books API.

UVM Alumni House Silver Pavilon