<?xml version='1.0' encoding='utf-8' ?>
<!-- Made with love by pretalx v2026.1.0.dev0. -->
<schedule>
    <generator name="pretalx" version="2026.1.0.dev0" />
    <version>0.40</version>
    <conference>
        <title>PyData Virginia 2025</title>
        <acronym>virginia2025</acronym>
        <start>2025-04-18</start>
        <end>2025-04-19</end>
        <days>2</days>
        <timeslot_duration>00:05</timeslot_duration>
        <base_url>https://cfp.pydata.org</base_url>
        
        <time_zone_name>US/Eastern</time_zone_name>
        
        
    </conference>
    <day index='1' date='2025-04-18' start='2025-04-18T04:00:00-04:00' end='2025-04-19T03:59:00-04:00'>
        <room name='Auditorium 5' guid='877b7c2e-8a54-5f91-a994-f14717c5c3be'>
            <event guid='52b6ed57-61f4-59de-a1f0-b30957c8504e' id='77456' code='DZTLEW'>
                <room>Auditorium 5</room>
                <title>Keynote: Building AI-First Organizations</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T09:15:00-04:00</date>
                <start>09:15</start>
                <duration>00:45</duration>
                <abstract>As businesses strive to become AI-first, the pivotal role of AI practitioners extends beyond technical implementation to encompass strategic stewardship. This transition necessitates a profound understanding of organizational goals, data governance, and ethical considerations. By aligning AI initiatives with business objectives, fostering cross-functional collaboration, and addressing challenges such as data privacy and employee adaptation, AI professionals can drive effective transformation. This keynote explores the essential competencies and approaches required for AI practitioners to lead their organizations successfully into an AI-centric future.</abstract>
                <slug>virginia2025-77456-keynote-building-ai-first-organizations</slug>
                <track></track>
                
                <persons>
                    <person id='78557'>Rajkumar Venkatesan</person>
                </persons>
                <language>en</language>
                <description>In the quest to become AI-first, organizations face the imperative of aligning technological innovation with strategic business objectives. This transformation requires AI practitioners to evolve into strategic stewards who not only possess technical expertise but also deeply understand organizational goals and the multifaceted challenges of AI implementation. Key considerations include:

- **Strategic Alignment:** AI initiatives must be closely integrated with the organization&apos;s overarching goals. This entails identifying areas where AI can drive significant value, such as enhancing operational efficiency, improving customer experiences, or enabling data-driven decision-making. A clear strategic vision ensures that AI projects are purpose-driven and aligned with business priorities. 
- **Data Management:** Treating data as a strategic asset is fundamental. This involves going beyond establishing robust data governance frameworks that ensure data quality, privacy, and security. Strategic data management practices enable leaders realize the monetary value of the organization&#8217;s data, build reliable AI models and foster trust among stakeholders.
- **Targeted AI Investment:** Organizations should focus AI development in domains where human capabilities are limited, allowing AI to complement human strengths. Conversely, in areas where humans excel and AI falls short&#8212;such as tasks requiring deep creativity, empathy, or complex judgment&#8212;investment should prioritize human expertise. This strategic allocation ensures that AI serves as an effective tool without encroaching upon domains where human skills are paramount. 
- **Human-AI Interaction Design:** Insights from research on human-machine interaction are vital for designing AI systems that are intuitive and user-friendly. Emphasizing the human-in-the-loop approach ensures that AI tools augment human capabilities, leading to more effective and ethical AI implementations. 
- **Ethical Considerations:** Addressing ethical challenges such as data privacy, bias, and regulatory compliance is crucial. Implementing AI responsibly involves proactive measures to mitigate risks and uphold ethical standards, thereby maintaining public trust and safeguarding the organization&apos;s reputation. 
- **Change Management:** Transitioning to an AI-first organization necessitates effective change management strategies. This includes reskilling and upskilling employees, managing cultural shifts, and addressing potential resistance to change. Empowering employees to work alongside AI technologies fosters a culture of innovation and continuous improvement.

This keynote delves into these critical aspects, offering insights into how AI practitioners can become effective stewards of AI strategy. By embracing a holistic approach that encompasses strategic alignment, robust data practices, ethical considerations, and proactive change management, organizations can successfully navigate the complexities of AI adoption and thrive in an AI-centric future.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/DZTLEW/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/DZTLEW/feedback/</feedback_url>
            </event>
            <event guid='18417d5c-039d-58b4-aa88-59cc504b814e' id='77258' code='3YQQ8N'>
                <room>Auditorium 5</room>
                <title>Making the most of test-time compute in LLMs</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T10:20:00-04:00</date>
                <start>10:20</start>
                <duration>00:35</duration>
                <abstract>Reasoning models like OpenAI&apos;s o3 and DeepSeek&apos;s R1 herald a new paradigm that leverages test-time compute to solve tasks requiring reasoning. These models represent a departure from traditional LLMs, upending long-held assumptions about them. In this session, we will discuss the different dimensions along which test-time compute can be expended and scaled. We will showcase best practices for prompting reasoning models as well as how to direct test-time compute towards achieving desired results. Finally, we will demonstrate how to train our own reasoning models specific to our domain or use case.</abstract>
                <slug>virginia2025-77258-making-the-most-of-test-time-compute-in-llms</slug>
                <track></track>
                
                <persons>
                    <person id='78596'>Suhas Pai</person>
                </persons>
                <language>en</language>
                <description>The objectives of this session are to:
1. Highlight differences between mainstream LLMs and reasoning models
2. Understand test-time compute and the different dimensions along which they can be scaled.
2. Demonstrate experimental results with reasoning models from DeepSeek and OpenAI
3. Learn how to prompt reasoning models effectively.
4.  Showcase how to leverage test time compute at the application level to achieve good results.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/3YQQ8N/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/3YQQ8N/feedback/</feedback_url>
            </event>
            <event guid='813b3fc4-e2f0-5af4-be6f-5f8564d07224' id='77060' code='JNHA9R'>
                <room>Auditorium 5</room>
                <title>Evaluating LLMs at S&amp;P Global: Building a Robust Evaluation Framework for GenAI Productivity Tools</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T10:55:00-04:00</date>
                <start>10:55</start>
                <duration>00:35</duration>
                <abstract>Discover how S&amp;P Global built an enterprise-grade evaluation framework that transformed our GenAI deployment process. Through automated monitoring, expert validation, &amp; continuous testing, we&#8217;ve streamlined the document integration step of our RAG tools, while ensuring our AI tools maintain consistent quality and reliability.</abstract>
                <slug>virginia2025-77060-evaluating-llms-at-s-p-global-building-a-robust-evaluation-framework-for-genai-productivity-tools</slug>
                <track></track>
                
                <persons>
                    <person id='78293'>MacKenzye Leroy</person>
                </persons>
                <language>en</language>
                <description>In this talk, we will provide an in-depth look at how S&amp;P Global built a comprehensive and reliable evaluation framework for our Generative AI (GenAI)-powered internal productivity tools, with a focus on our Market Intelligence (MI) Sales Assistant application.

We will begin by discussing the unique challenges of evaluating large language models (LLMs) and the importance of a robust evaluation strategy, especially for Retrieval Augmented Generation (RAG)-based systems. We&#8217;ll then dive into the key components of our framework:

&#8226; Metrics: We thoughtfully combine traditional statistical metrics like accuracy, precision, and latency with LLM-specific metrics such as answer relevance, faithfulness to source, and hallucination detection. We&#8217;ll explain each metric and its role in assessing model performance and talk about how custom metrics are often necessary in LLM applications.

&#8226; Question-Answer Pair Generation: We&#8217;ll share our process for generating diverse and representative question-answer pairs, including the models used, quality control measures, and lessons learned around promoting diversity in evaluation data.

&#8226; Ground Truth Creation: Our framework heavily involves subject matter experts (SMEs) to create and validate ground truth data. We&#8217;ll detail our process for engaging SMEs , documenting and versioning ground truth, and maintaining high standards.

&#8226; Evaluation Implementation: We&#8217;ll provide a technical overview of our framework, built using the MLflow library. We&#8217;ll cover our daily sampling process for continuous monitoring, our comprehensive testing triggered by new releases and document updates, and cost considerations. We will also talk broadly on other tools available outside of MLFlow.

Throughout the talk, we&#8217;ll share real-world results and concrete lessons learned, such as effective strategies for question generation, SME engagement, and scaling evaluation processes. We&#8217;ll demonstrate our MI Sales Assistant and evaluation dashboard to illustrate the framework in action.

Attendees will come away with a clear understanding of what it takes to implement a robust evaluation framework for a real-world GenAI application. They&#8217;ll learn proven best practices and potential pitfalls, equipping them to ensure their own AI systems consistently deliver value.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/JNHA9R/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/JNHA9R/feedback/</feedback_url>
            </event>
            <event guid='8bb982ce-ef2a-5c5e-ba3c-ec15cf25e88e' id='77386' code='FHY93D'>
                <room>Auditorium 5</room>
                <title>Maximizing Multimodal: Exploring the search frontier of text-to-image models to improve visual find-ability for creatives</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T11:30:00-04:00</date>
                <start>11:30</start>
                <duration>00:35</duration>
                <abstract>Text-to-Image models, like CLIP, have brought us into a new frontier of visual search. Whether it&apos;s searching by circling a section of a photo or powering image generators like Dalle-E the gap between pixels and tokens has never been smaller. This talk discusses how we are improving search and empowering designers with these models at Eezy, a stock art marketplace.</abstract>
                <slug>virginia2025-77386-maximizing-multimodal-exploring-the-search-frontier-of-text-to-image-models-to-improve-visual-find-ability-for-creatives</slug>
                <track></track>
                
                <persons>
                    <person id='78372'>Nathan Day</person>
                </persons>
                <language>en</language>
                <description>Objective:
Describe where and how we have improved the search experience in our product with open source multi-modal models and libraries. Real world examples from the things we have shipped (and decided not to ship) to production.

Outline:
1. Cover the architecture of open source hybrid search stack at Eezy (Elasticsearch, FAISS, PyTorch)
2. Demo the capabilities and limitations of openCLIP for retrieval embeddings
3. Highlight meaningful stops on our product roadmap from the last 2 years of deploying features into production.
4. Describe notable missteps and surprises uncovered along the way, so people see it&apos;s not all roses in the AI powered future.
5. Demo of BORGES, a novel search framework that allows users to search with multiple queries for a nuanced navigation of the catalog to find exactly what they need

Audience:
- Anyone curious about real-world results we have extracted from AI
- Search practitioners developing hybrid search applications
- PyTorch and transformers enthusiasts interested in applications in vector space
- This talk is not overtly technical and does not require a background in ML/search/AI. The most math required is some multiplication and division, you got it, jump in.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/FHY93D/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/FHY93D/feedback/</feedback_url>
            </event>
            <event guid='4eb1b35f-738d-5e39-b31c-6878661537ff' id='77075' code='XEBBH7'>
                <room>Auditorium 5</room>
                <title>Fine tuning embeddings for semantic caching</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T12:05:00-04:00</date>
                <start>12:05</start>
                <duration>00:30</duration>
                <abstract>Large Language Models (LLMs) have opened new frontiers in natural language processing but often come with high inference costs and slow response times in production. In this talk, we&#8217;ll show how semantic caching using vector embeddings&#8212;particularly for frequently asked questions&#8212;can mitigate these issues in a RAG architecture. We&#8217;ll also discuss how we used contrastive fine-tuning methods to boost embedding model performance to accurately identify duplicate questions. Attendees will leave with strategies for reducing infrastructure costs, improving RAG latency, and strengthening the reliability of their LLM-based applications. *Basic familiarity with NLP or foundation models is helpful but not required.*</abstract>
                <slug>virginia2025-77075-fine-tuning-embeddings-for-semantic-caching</slug>
                <track></track>
                
                <persons>
                    <person id='78245'>Tyler Hutcherson</person><person id='78612'>Srijith Rajamohan</person><person id='78538'>Waris Gill</person>
                </persons>
                <language>en</language>
                <description># Who Should Attend?
This talk is designed for AI engineers and researchers interested in building with LLMs in production. Attendees with a basic understanding of NLP and RAG systems will benefit most, but the concepts and demonstrations will be approachable for a general technical audience.

# Why It&#8217;s Interesting?
As organizations incorporate LLMs into real-world products, they grapple with inference compute demands and sluggish response times. Semantic caching offers a pragmatic solution: once you identify frequently asked questions (or reoccurring queries), you can serve results from a cache rather than running a fresh, computationally expensive inference every time. This lowers cost and latency. Moreover, using various fine-tuning methods on the retrieval models improves the accuracy of &#8220;question deduplication,&#8221; ensuring cache hits are matched reliably.

# Key Takeaways
- Semantic Caching Fundamentals: How to design and implement a caching layer tailored for question-answering or conversational systems (RAG).
- Embedding Fine-Tuning: An overview of contrastive methods to improve embedding models&#8217; ability to detect near-duplicate or semantically similar queries.
- Practical Insights: Best practices for integrating semantic caching in production, along with tips for monitoring performance and keeping infrastructure costs down.
- Real world examples.

# Background Knowledge
- Minimal NLP/ML Knowledge: Familiarity with embeddings, vector similarity, and basic model inference is helpful.
- Basic Software Engineering: Familiarity with productionizing ML workflows will help contextualize the caching strategy.

# Talk Outline (30 minutes)
1. Introduction to LLM challenges in production (high inference cost, slow responses) with real world examples.
2. Overview of semantic caching: concepts, benefits, and common pitfalls.
3. Improving cache hit rates with contrastive fine-tuning: what it is and how it enhances embedding models.
4. Demo of improving duplicate question detection.
5. Recap and system architecture review.
6. Share resources for further learning (GitHub links, additional reading, etc.)

By the end of this session, attendees will have a clear roadmap for employing semantic caching and contrastive fine-tuning to reduce costs and improve performance in LLM-powered applications. We look forward to sharing our experiences and answering your questions!</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/XEBBH7/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/XEBBH7/feedback/</feedback_url>
            </event>
            <event guid='87a9e581-6319-599d-be45-a5fa0db6d639' id='77520' code='NEKHFV'>
                <room>Auditorium 5</room>
                <title>Panel: Principles for Effective and Successful Data Scientists</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T13:35:00-04:00</date>
                <start>13:35</start>
                <duration>01:00</duration>
                <abstract>What truly makes a data scientist effective in their job and career? Come hear from our panel of data scientists, each with their unique pathway into data science, discuss the principles that matter: pathways to data science, translating business problems, and what technical expertise means for data science. Grow your insight into becoming the kind of data scientist people trust to solve the right problems, the right way.</abstract>
                <slug>virginia2025-77520-panel-principles-for-effective-and-successful-data-scientists</slug>
                <track></track>
                
                <persons>
                    <person id='78248'>Aaron Baker</person><person id='78344'>Renee Teate</person><person id='78320'>David Der</person>
                </persons>
                <language>en</language>
                <description>This conversational panel brings together experienced data science professionals to explore what truly matters for success in the field beyond what&apos;s typically learned in educational settings.

Our panelists will share insights on:
* The &quot;real world&quot; skills critical to data science that aren&apos;t typically taught in academic programs
* Foundations of data science: the core understanding data, the mechanics of models, and the importance of considering MLOps as a Data Scientist
* How to stand out in data science job opportunities and the pathways in and through data science.
* Practical advice for students, job seekers, and career changers looking to enter or advance in data science

This session will be valuable for students, early-career data scientists, those interviewing for data science roles, professionals seeking promotions, and individuals looking to transition from other fields into data science.

The panel will include time for audience Q&amp;A, allowing attendees to ask specific questions around each major discussion point during the process.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/NEKHFV/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/NEKHFV/feedback/</feedback_url>
            </event>
            <event guid='c50e1cc6-3bca-5232-817b-3bc38b10a35b' id='77117' code='HKZH7C'>
                <room>Auditorium 5</room>
                <title>Addressing Climate Change with AI</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T14:55:00-04:00</date>
                <start>14:55</start>
                <duration>00:35</duration>
                <abstract>This talk will survey how AI is currently used to address climate change, and describe possible future use cases.  This high-level overview will touch on various aspects of climate change (e.g. energy, transportation, land use), of AI (e.g. image processing, reinforcement learning, LLMs), and of their intersection.  The talk will conclude with resources for learning more about this area, and suggestions for contributing to current and future efforts.</abstract>
                <slug>virginia2025-77117-addressing-climate-change-with-ai</slug>
                <track></track>
                
                <persons>
                    <person id='78319'>Dan Loehr</person>
                </persons>
                <language>en</language>
                <description>Overview: 
AI is profoundly shaping society.  An equally forceful phenomenon is climate change; humanity is already feeling the impacts, and temperatures and greenhouse gas emissions keep rising.  The goal of this talk is to briefly survey the many ways AI is and can be used to address climate change, and to provide pointers to anyone interested in contributing to the effort.  The intended audience is anyone with an interest in this intersection of AI and climate change.

Climate Change: 
We&#8217;ll briefly discuss aspects of climate change which AI is tackling, such as mitigating emissions from the five most carbon-intensive sectors: energy, manufacturing, land use, transportation, and buildings / infrastructure.  We&#8217;ll also look at AI&#8217;s application to other areas such as climate modeling, carbon capture, climate finance, and reducing the carbon footprint of AI itself.  

AI: 
We&#8217;ll see how a number of AI methods can be used to address climate change, including: various neural net architectures (e.g. convolutional, recurrent, graph), LLMs, reinforcement learning, generative AI, neural operators, causality, and natural language processing.

Their intersection: 
We&#8217;ll display a matrix of climate change domains and selected AI methods that can address them, as a guide to tractable areas to tackle.  We&#8217;ll look at unsolved climate-related areas where AI could potentially help.  We&#8217;ll conclude by providing resources for anyone wishing to learn more about this intersection, and for technologists wanting to plug into an existing community to contribute to this effort.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/HKZH7C/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/HKZH7C/feedback/</feedback_url>
            </event>
            <event guid='6b34a585-a19c-5924-9a95-19863f8ef782' id='77427' code='SF7WAK'>
                <room>Auditorium 5</room>
                <title>Real-Time Fitness Leaderboards with Open-Source Moose</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T15:30:00-04:00</date>
                <start>15:30</start>
                <duration>00:35</duration>
                <abstract>Ever wished you could power live leaderboards for fitness challenges or dynamically award wellness badges in real time? Traditional OLTP systems often buckle under the pressure of continuous writes and aggregate reads. In this talk, we&#8217;ll explore how Moose, an open-source OLAP platform, enables rapid ingestion and lightning-fast queries on health and workout data. We&#8217;ll walk through a demo of creating real-time fitness leaderboards, awarding achievement badges, and using Python-based tools for data ingestion and visualization. Attendees will learn how an OLAP approach streamlines the architecture for modern wellness and health applications.</abstract>
                <slug>virginia2025-77427-real-time-fitness-leaderboards-with-open-source-moose</slug>
                <track></track>
                <logo>/media/virginia2025/submissions/SF7WAK/cyberpunk-bike_FiPcEpP_WZ9eRbK.png</logo>
                <persons>
                    <person id='78320'>David Der</person>
                </persons>
                <language>en</language>
                <description>What &amp; Why
Health and fitness applications produce constant streams of data, from workout logs and step counts to heart-rate measurements and sleep metrics. Crafting a dynamic, user-facing experience&#8212;like up-to-the-minute leaderboards or automated badge award systems&#8212;requires real-time data access and frequent aggregations. Traditional OLTP databases can stall under heavy reads and writes, making it tough to maintain a snappy user experience.

Enter Moose, an open-source analytics engine built around a columnar architecture. With Moose, developers and data teams can:

Ingest large volumes of real-time data from wearables, apps, and sensors.
Run near-instantaneous aggregations to power live dashboards or personal health insights.
Scale analytics cost-effectively thanks to Moose&#8217;s open-source foundation and Python-friendly ecosystem.
Practical Use Case: Real-Time Fitness Leaderboards
We&#8217;ll demonstrate how to build a workout leaderboard that updates in real time as users complete activities. We&#8217;ll also show how to apply custom rules for awarding achievement badges, ensuring that your application can both process and surface analytics-driven insights at scale.

Who Should Attend
Data &amp; Analytics Engineers: Seeking solutions to handle large volumes of health/wellness data with frequent aggregations.
Developers/Architects: Building real-time or near-real-time consumer apps that rely on fast analytics.
Product Managers &amp; Tech Leads: Interested in creating engaging features like live dashboards and automatic badge systems within their wellness offerings.
Health &amp; Fitness Enthusiasts: Looking to understand how data architecture can enhance user engagement and personalized metrics.
A basic understanding of databases, Python data tools, and event streams (e.g., from wearable devices) is helpful but not required.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/SF7WAK/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/SF7WAK/feedback/</feedback_url>
            </event>
            <event guid='7db160f5-abf9-5dd4-855c-3ae4448339d8' id='77506' code='D3Z7XN'>
                <room>Auditorium 5</room>
                <title>Panel: Bridging the Gap: Collaborative Approaches to Data Science</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T16:05:00-04:00</date>
                <start>16:05</start>
                <duration>01:00</duration>
                <abstract>During this expert panel, we&apos;ll explore the critical intersections of data science, engineering, and stakeholder engagement in today&apos;s organizations. This discussion will address how to break down silos between technical disciplines, establish effective collaboration models, create rapid experimentation frameworks, and successfully transition projects from exploration to production. Our panelists bring diverse perspectives on building integrated teams that balance innovation with enterprise standards while delivering real value.</abstract>
                <slug>virginia2025-77506-panel-bridging-the-gap-collaborative-approaches-to-data-science</slug>
                <track></track>
                
                <persons>
                    <person id='78273'>Thomas  Loeber</person><person id='78251'>Manikandarajan Shanmugavel</person><person id='78344'>Renee Teate</person><person id='78305'>Christopher N. Eichelberger</person>
                </persons>
                <language>en</language>
                <description>This panel brings together practitioners and leaders to discuss the evolving landscape of data science collaboration and implementation. As organizations face increasing pressure to derive value from AI/ML initiatives, the traditional boundaries between disciplines are being reexamined and redefined.

Our panelists will explore:

- Breaking down isolation between data scientists, MLOps engineers, developers, and other stakeholders
- Creating effective frameworks for rapid experimentation that balance innovation with enterprise standards
- Establishing robust handoff processes for transitioning models from exploration to production
- Bridging cultural divides between the explorative nature of data science and the engineering mindset of MLOps
- Practical strategies for cross-functional collaboration that leverages complementary skills
- Managing stakeholder expectations and improving communication with non-technical audiences

This discussion is designed for data professionals at all levels&#8212;from individual contributors to team leaders and executives&#8212;who are navigating the challenges of modern data science implementation. The panel will address both technical and organizational aspects of successful data science teams.

The session will include time for audience Q&amp;A, allowing attendees to engage directly with panelists about their specific challenges in building collaborative data science environments.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/D3Z7XN/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/D3Z7XN/feedback/</feedback_url>
            </event>
            
        </room>
        <room name='Auditorium 4' guid='a06ff8ae-8e5b-5105-83c2-7600e2120b6c'>
            <event guid='b5129226-a1da-535f-89e9-30d2e7d85d7e' id='77050' code='RBYY9R'>
                <room>Auditorium 4</room>
                <title>Practical Applications of Apache Arrow</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T10:20:00-04:00</date>
                <start>10:20</start>
                <duration>00:35</duration>
                <abstract>Data system interoperability remains a significant challenge in open source ecosystems, with high costs in development time and resources when moving data across complex infrastructures. The Apache Arrow project offers a standardized solution to reduce these integration challenges.

Will Ayd (Apache Arrow Committer and pandas maintainer) and Matt Topol (Apache Arrow PMC Member and author of &quot;In Memory Analytics with Apache Arrow&quot;) will discuss how Apache Arrow is changing the data landscape. A brief overview of Arrow standards will be provided, while also reviewing real world implementations of where the Arrow specification has driven down the cost of data interoperability.</abstract>
                <slug>virginia2025-77050-practical-applications-of-apache-arrow</slug>
                <track></track>
                
                <persons>
                    <person id='78593'>William Ayd</person><person id='78594'>Matthew Topol</person>
                </persons>
                <language>en</language>
                <description>The Apache Arrow project has been drastically improving the way analytical tools perform, interoperate, and scale. However, as Arrow is primarily used by developers, much of those improvements are happening &quot;behind the scenes,&quot; leaving many uninformed as to what exactly Apache Arrow is.

In this talk, we will provide a more formal definition of Apache Arrow, and discuss its various components that collectively are helping to revolutionize the data landscape. We will also take some time to explore how popular Python packages like pandas, polars, and pantab have been leveraging Apache Arrow for interoperability between utilities, while also having an open discussion as to what can still be done.

By the end of this talk, users will have an appreciation of how Apache Arrow is powering their Python (and non-Python!) libraries today, and how it will shape the data landscape going forward. Topics like Arrow Flight, Arrow Flight SQL, Arrow ADBC, and nanoarrow will be discussed, and attendees will gain a deeper understanding of how these technologies are evolving the way data is used in embedded environments, relational databases, HTTP exchanges, AI applications, and more.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/RBYY9R/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/RBYY9R/feedback/</feedback_url>
            </event>
            <event guid='18bc421a-9ad8-5e7d-bc2f-ea7f1c51c6a4' id='77070' code='8GSQPK'>
                <room>Auditorium 4</room>
                <title>Data wrangling with DuckDB</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T10:55:00-04:00</date>
                <start>10:55</start>
                <duration>00:35</duration>
                <abstract>Learn how to wrangle data in Python with DuckDB, a fast, open source, in-process analytical SQL database!</abstract>
                <slug>virginia2025-77070-data-wrangling-with-duckdb</slug>
                <track></track>
                
                <persons>
                    <person id='78544'>Will Angel</person>
                </persons>
                <language>en</language>
                <description>Learn how to use DuckDB to process data in python! In the era of &quot;big data,&quot; many data practitioners immediately reach for distributed computing solutions when facing large datasets. Modern hardware capabilities combined with efficient tools like DuckDB make this much less necessary than a few years ago. This talk will demonstrate how to effectively wrangle data using DuckDB in Python, offering a powerful alternative to Pandas and Spark for the majority of data science workflows.

This session will cover:

- Understanding DuckDB&apos;s architecture and its integration with the Python ecosystem
- Practical examples of migrating from pandas to DuckDB.
- Performance benchmarks comparing DuckDB against pandas and other popular Python data processing methods
- Real-world scenarios where DuckDB shines, including handling larger-than-memory datasets
- Discussion of the &quot;shrinking size&quot; of big data and when to consider DuckDB versus distributed computing solutions

This talk is aimed at Python data practitioners who regularly work with medium to large datasets (100MB-100GB) and are looking to optimize their data processing workflows. The presentation will include both conceptual explanations and hands-on code examples.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/8GSQPK/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/8GSQPK/feedback/</feedback_url>
            </event>
            <event guid='8fa4404b-5508-5a40-9e9c-49ce8ee1762f' id='77522' code='UDQZBM'>
                <room>Auditorium 4</room>
                <title>Zero Code Change GPU-Powered Graph Analytics with NetworkX and cuGraph</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T11:30:00-04:00</date>
                <start>11:30</start>
                <duration>00:35</duration>
                <abstract>Graphs are a fundamental form of storing data. This is because **everything is connected!** Hence, Graphs are very useful for modeling and solving a wide variety of real-world problems.
 
While NetworkX is amazing for getting started with Graphs, the library encounters bottlenecks in performance at scale.

Is there a solution out there for users who want more performance from NX and also Open-Source developers who want to implement fast algorithms? Yes! Thanks to the magic of dispatching.

NetworkX now supports dispatching to various backends, including the GPU accelerated cuGraph library by Nvidia RAPIDS.

Attend this talk to learn about how you can use nx-cugraph &#8211; the cuGraph-powered backend for NetworkX &#8211; and how it unlocks exciting new possibilities for you to solve real-world graph analytics problems.</abstract>
                <slug>virginia2025-77522-zero-code-change-gpu-powered-graph-analytics-with-networkx-and-cugraph</slug>
                <track></track>
                <logo>/media/virginia2025/submissions/UDQZBM/Screenshot_2025-04-19_a_Fxttayq.png</logo>
                <persons>
                    <person id='78343'>Ralph Liu</person>
                </persons>
                <language>en</language>
                <description>This talk will showcase a GPU accelerated graph backend presented by NVIDIA in partnership with the NetworkX Community. It aims to showcase how GPUs are well-suited to solving graph problems at large scales.

The talk is intended for Python developers who are interested in using GPUs in their workflows and data scientists interested in Graph analytics.

During the talk, we intend to go over the following.

1. Brief introduction to Graphs and why Graph Analytics is so powerful.

2. Introducing NetworkX &#8211; Why is it so popular? What are its limitations?

3. Example showcasing the magic of Dispatching. The design philosophy and how it benefits both users and OS developers.

4. Real-world example on the Pokec (Social Network) dataset. How to do Community Detection on a large Graph using Louvain (with Zero Code Change)!

5. Finally, how we aim to work with the community to add new algorithm implementations and contribute to upstream NetworkX.
 
6. Q&amp;A!

===

Learn more:

 - [Project page](https://rapids.ai/nx-cugraph/)!
 - [GitHub](https://github.com/rapidsai/nx-cugraph)!

I&apos;d love to connect with you and discuss ideas of applying Graph analytics to *your* work.

Reach out via [LinkedIn](https://www.linkedin.com/in/ralph-liu/)</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/UDQZBM/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/UDQZBM/feedback/</feedback_url>
            </event>
            <event guid='86397d18-d4a9-54d2-adf4-adfbfaf9d4c0' id='77170' code='XRXKDK'>
                <room>Auditorium 4</room>
                <title>Practical Multi Armed Bandits</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T12:05:00-04:00</date>
                <start>12:05</start>
                <duration>00:30</duration>
                <abstract>Multi-armed bandits are a reinforcement learning tool often used in environments where the cost or rewards of different choices are unknown or where those functions may change over time. The good news is that as far as implementation goes, bandits are surprisingly easy to implement; however, in practice, the difficulty comes from defining a reward function that best targets your specific use case. In this talk, we will discuss how to use bandit algorithms effectively, taking note of practical strategies for experimental design and deployment of bandits in your applications.</abstract>
                <slug>virginia2025-77170-practical-multi-armed-bandits</slug>
                <track></track>
                <logo>/media/virginia2025/submissions/XRXKDK/Gemini_Generated_Image_JRnoubv.jpeg</logo>
                <persons>
                    <person id='78317'>Benjamin Bengfort</person>
                </persons>
                <language>en</language>
                <description>Imagine a row of slot machines (often called one-armed bandits because of the lever on the side and the fact that they take your money) -- you know that one of them will pay out more than the others over time, but how do you figure out which one? This is the premise of the multi-armed bandit (MAB) problem, which has become a vital reinforcement learning technique used to balance the exploration-exploitation dilemma (e.g., at what point do you start exploiting the best choice to maximize your rewards instead of exploring for better options).

Multi-armed bandits are straightforward to implement: define your choices and assign each of them a probability distribution for selection. Each time a choice is made, the probability distribution for that choice is updated based on the outcome of a reward function. Easy right? The trick is in designing both your choices and your reward function in such a way that you capture the dynamics of your experimental environment, often a live environment that involves user behavior and other irregularities! 

Things get more complicated when you have multiple agents - each of them with their own probability distributions. Here, you need to design the reward functions such that your desired behavior emerges from the collective interactions of each individual agent. The best type of complexity arises globally from many simple local interactions! 

In this talk, we will learn how to implement multi-armed bandits and reward functions for three use cases: ordering a news feed, prioritizing tasks for a team in a sprint, and minimizing cloud costs for a distributed system. We&apos;ll focus on practical strategies for designing reward functions and dealing with change. At the end of this talk you should be ready and excited to implement bandit algorithms for your own data science problems!</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/XRXKDK/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/XRXKDK/feedback/</feedback_url>
            </event>
            <event guid='33a1881e-24b9-5167-8049-aa27ec81d382' id='77497' code='AFZSVT'>
                <room>Auditorium 4</room>
                <title>Using Python to Unlock Insights from OpenStreetMap Data at Scale</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T14:55:00-04:00</date>
                <start>14:55</start>
                <duration>00:35</duration>
                <abstract>Geospatial data can unlock valuable insights. OpenStreetMap includes electric power and telecommunication infrastructure geospatial data, and it is already &#8220;open&#8221;. This presentation will demonstrate how to use Python to &#8220;unlock the insights&#8221; available in OSM power and telecommunications geospatial data.</abstract>
                <slug>virginia2025-77497-using-python-to-unlock-insights-from-openstreetmap-data-at-scale</slug>
                <track></track>
                <logo>/media/virginia2025/submissions/AFZSVT/Screenshot_2025-02-13_1_jm3KaOo.png</logo>
                <persons>
                    <person id='78392'>Cory Eicher</person>
                </persons>
                <language>en</language>
                <description>Commercial real estate organizations are avid consumers of geospatial data. These organizations have already identified the value in particular of power and telecommunications infrastructure spatial data to make business decisions. Examples of these data include: the locations of power plants, transmission lines, fiber backbone cables, and  submarine fiber cables.

One rich source for these datasets is OpenStreetMap (OSM), however natively OSM does not  streamline access to data, especially at scale. Because OSM data are open, we can use Python to query, download, and transform OSM power and telecommunications spatial data for use within Open Source and commercial Geographic Information Systems (GIS) software applications, models built in Python and other languages, and really any other tools and processes which can read GIS data. 

This presentation will present a high-level overview of the overall data flow, and then dive into individual steps and how each step was implemented in Python. Examples will be provided, and maps and analyses based on the resulting spatial data will be demonstrated. This presentation will also explain one approach to download very large OSM datasets, for example data spanning continents and including many different themes. Along the way this presentation will also touch on how to avoid &#8220;gotchas&#8221; and how this approach could be adopted to different types of OSM data supporting other use cases and business requirements.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/AFZSVT/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/AFZSVT/feedback/</feedback_url>
            </event>
            <event guid='664a9276-41cc-5125-bdef-f6acac6ebcb0' id='77496' code='ECJWAP'>
                <room>Auditorium 4</room>
                <title>Versioning Multimodal Data: Metadata &amp; Beyond</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T15:30:00-04:00</date>
                <start>15:30</start>
                <duration>00:35</duration>
                <abstract>The team behind DVC has spent years tackling data versioning challenges. With the rise of AI, we&#8217;ve seen new complexities emerge - especially with multimodal datasets like images, video, audio, and text. This talk shows why multimodal data versioning is different and how Pydantic provides a powerful way to structure and integrate metadata.</abstract>
                <slug>virginia2025-77496-versioning-multimodal-data-metadata-beyond</slug>
                <track></track>
                <logo>/media/virginia2025/submissions/ECJWAP/icon_mEQ90e7_ki3Nw6z.png</logo>
                <persons>
                    <person id='78413'>Dmitry Petrov</person>
                </persons>
                <language>en</language>
                <description>The team behind DVC has spent years tackling data versioning challenges. With the rise of AI, we&#8217;ve seen new complexities emerge - especially with multimodal datasets like images, video, audio, and text. Simply tracking files is no longer enough; metadata, including bounding boxes, poses, text annotations, and embeddings, is now central to dataset management, using LLM for auto-annotation is becoming a daily routine. This talk shows why multimodal data versioning is different, how Pydantic provides a powerful way to structure and integrate metadata and how this approach is implemented in open-source library DataChain.

We&#8217;ll also cover efficient dataset operations at scale: computing diffs across millions of files, managing expensive GPU-based metadata computations like embeddings and performing incremental dataset updates. The audience will learn practical tricks for building scalable, high-performance AI workflows with modern dataset management techniques.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/ECJWAP/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/ECJWAP/feedback/</feedback_url>
            </event>
            <event guid='b8821bfe-9be1-56a1-964f-4675e2630489' id='77137' code='8M9ZJN'>
                <room>Auditorium 4</room>
                <title>AI Ready Data</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T16:05:00-04:00</date>
                <start>16:05</start>
                <duration>00:35</duration>
                <abstract>In today&#8217;s AI first era, customers expect data products to be deeply interconnected, consumable with minimal effort and widely available. The need for &#8216;AI Ready Data&#8217;, suitable for consumption directly by AI agents has never been clearer.</abstract>
                <slug>virginia2025-77137-ai-ready-data</slug>
                <track></track>
                
                <persons>
                    <person id='78550'>Alec Gosse</person><person id='78540'>Hamish Brookeman</person>
                </persons>
                <language>en</language>
                <description>Customers have been clear that receiving &#8216;just&#8217; data is no longer sufficient. They expect data to be immediately accessible, usable and understandable to both human and AI consumers with &#8220;zero ETL&#8221; (Extract Transform Load). We will discuss the direction that S&amp;P is taking explicitly aimed at serving this need, greatly increasing the insight available to customers. This includes the concept of providing machine readable metadata at the column level for datasets. This metadata permits AI and ETL tools to automatically ingest and connect delivered data to a customer&#8217;s own data as well as automatically import that data into a customer&#8217;s data catalog.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/8M9ZJN/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/8M9ZJN/feedback/</feedback_url>
            </event>
            <event guid='7c95b711-c6d9-59b9-84ca-0b7abd456cbc' id='77181' code='FMQ8PA'>
                <room>Auditorium 4</room>
                <title>Visualization of higher-dimensional feature spaces during model training</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T16:40:00-04:00</date>
                <start>16:40</start>
                <duration>00:35</duration>
                <abstract>Modern machine learning models typically utilize extremely high-dimensional feature spaces, which inhibits robustness and explainability. Finer-grained control over model training requires more powerful tools for observing and interacting with latent features as they evolve over time. In this talk, we give several examples of visualizations of nearest-neighbor graphs that illuminate common training pitfalls and provide practical insights for diagnosing model performance issues.</abstract>
                <slug>virginia2025-77181-visualization-of-higher-dimensional-feature-spaces-during-model-training</slug>
                <track></track>
                
                <persons>
                    <person id='78325'>Vivek Dhand</person>
                </persons>
                <language>en</language>
                <description>The goal of this talk is to provide machine learning practitioners with a few simple visualizations for more effective model training. These techniques have been developed through several years of real-world experience with model training, validation, deployment, and maintenance. Since the internal workings of large models are usually somewhat opaque, model trainers often ask themselves a familiar set of questions:  

When should I stop training my model? 

Which one of my saved model checkpoints is the &#8220;best&#8221;? 

What training data should I add (or remove) to achieve a given outcome? 

How do I know if my model is giving the right answer for the wrong reasons, or vice versa? 

How robust is my model to out-of-distribution data? 

Why is there performance drift in my deployed model? 

We argue that much greater emphasis on model observability and explainability is needed, and that the right sorts of visualizations can generate valuable insights and point toward specific improvements.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/FMQ8PA/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/FMQ8PA/feedback/</feedback_url>
            </event>
            
        </room>
        <room name='Auditorium 3' guid='45d6d0a8-f4ad-507a-aa78-040a499ffab5'>
            <event guid='2af63c1d-f407-5ad1-93fd-ad241c83c0bb' id='77457' code='ZXYBV3'>
                <room>Auditorium 3</room>
                <title>Bayesian Risk Analysis For Large Multi-Modal Data</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T10:20:00-04:00</date>
                <start>10:20</start>
                <duration>00:35</duration>
                <abstract>In the era of big data, multi-modal data from multiple sources or modalities has become increasingly prevalent in various fields such as healthcare. The National COVID Cohort Collaborative (N3C) provides researchers with abundant clinical data in different forms by aggregating and harmonizing Electronic Health Records (EHR) data across different clinical organizations in the United States, making it convenient for researchers to analyze COVID-related topics and build models with large multimodal data. Bayesian risk analysis has advantages in handling the complexities and heterogeneities of multi-modal healthcare data, specifically in cohort studies when researchers try to answer questions of interest in public health or medicine field regarding COVID and Long COVID.</abstract>
                <slug>virginia2025-77457-bayesian-risk-analysis-for-large-multi-modal-data</slug>
                <track></track>
                
                <persons>
                    <person id='78278'>Sihang Jiang</person>
                </persons>
                <language>en</language>
                <description>This talk is based on research projects by UVA iTHRIV on the N3C platform. Its target audience includes data scientists, undergraduate students, graduate students, researchers, and anyone interested in data science. The general structure of this talk will consist of a brief introduction to The National COVID Cohort Collaborative (N3C), a database with multi-modal data sets, quantitative methods and models in Bayesian risk analysis, and some real-world applications of these methods as well as some publications by our team. This talk will be informative and will include a balanced percentage of mathematical expressions and real-world applications, and the audience will learn more about quantitative methods to analyze multi-modal data in N3C.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/ZXYBV3/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/ZXYBV3/feedback/</feedback_url>
            </event>
            <event guid='cad6b9f3-8383-5bb5-9cdc-e89b4e2438df' id='77218' code='GLBTZD'>
                <room>Auditorium 3</room>
                <title>Saving Lives with Data Science: How data science shortened the COVID-19 pandemic by 2 months</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T10:55:00-04:00</date>
                <start>10:55</start>
                <duration>00:35</duration>
                <abstract>When every day counted during the COVID-19 pandemic, data science became an essential catalyst in accelerating the path to widespread vaccination. This talk delves into the data-driven strategies that enabled the U.S. government&#8217;s vaccine trials to move faster, cutting crucial weeks&#8212;6 to 8, by our estimates&#8212;off the timeline to deployment. Through sophisticated geospatial modeling, we identified and swiftly mobilized trial recruitment efforts in emerging hot zones, ensuring that each candidate pool was both numerically sufficient and demographically representative. Attendees will discover how advanced analytics, predictive modeling, and interdisciplinary collaboration converged to target the right communities at the right time, ultimately expediting vaccine availability. This behind-the-scenes look at rapid-response data science highlights not just the technical innovations, but the decisive cultural and operational shifts that turned real-time insights into life-saving action.</abstract>
                <slug>virginia2025-77218-saving-lives-with-data-science-how-data-science-shortened-the-covid-19-pandemic-by-2-months</slug>
                <track></track>
                
                <persons>
                    <person id='78255'>Greg Michaelson</person>
                </persons>
                <language>en</language>
                <description>This talk explores how data science accelerated COVID-19 vaccine trials, saving 6-8 weeks in deployment. Through geospatial modeling, we targeted diverse recruitment in emerging hot zones, ensuring efficient and representative trials. Attendees will discover how advanced analytics and collaboration turned insights into life-saving action.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/GLBTZD/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/GLBTZD/feedback/</feedback_url>
            </event>
            <event guid='7821c06f-67ed-5379-99a1-bf7f8e47df5a' id='77122' code='PG9CKX'>
                <room>Auditorium 3</room>
                <title>The Art of Brain Data in ASD Subjects: Celebrating Neurodiversity Through Aesthetic Data Visualization</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T11:30:00-04:00</date>
                <start>11:30</start>
                <duration>00:35</duration>
                <abstract>In our project, we took MRI-derived brain data and reinterpreted it through an aesthetic lens. Using multidimensional scaling (MDS) to distill complex patterns in cortical anatomy, we transformed these insights into physical 3D-printed brain models. Each sculpture serves as a tangible narrative, celebrating both the subtle and striking differences between male and female brains, whether neurotypical or affected by ASD.</abstract>
                <slug>virginia2025-77122-the-art-of-brain-data-in-asd-subjects-celebrating-neurodiversity-through-aesthetic-data-visualization</slug>
                <track></track>
                
                <persons>
                    <person id='78536'>Siwen Liao</person>
                </persons>
                <language>en</language>
                <description>Historically, research has highlighted a notable disparity in ASD diagnoses&#8212;with males being diagnosed significantly more frequently than females. However, beneath these statistics lies a rich tapestry of neuroanatomical diversity that often goes unnoticed. Our work reimagines this disparity as a piece of art, where data becomes a sculptural medium inviting viewers to engage with and reflect on the intricacies of brain structure.

Drawing on over 300 3D brain surface models from the Autism Centers of Excellence (ACE) study, our approach blends advanced MRI neuroimaging, multivariate statistical analysis, and cutting-edge 3D printing technology. The result is an artful representation that not only quantifies but also visually and tangibly celebrates sex differences in brain morphology across both ASD and non-ASD populations.

This presentation will take you on a journey through our methodological and creative process&#8212;from the acquisition and analysis of complex neuroimaging data to the transformation of these insights into physical art. We will discuss the technical details of MRI scanning, the challenges and innovations in our multivariate analyses, and the craftsmanship behind the 3D printing process.

Designed for an audience spanning both scientific and artistic disciplines, this presentation aims to inspire new ways of thinking about data visualization. By embracing &quot;data as art,&quot; we encourage a more holistic understanding of neurodiversity&#8212;one that not only informs but also resonates on an emotional and aesthetic level. Join us for this presentation as we explore how the fusion of art and science can lead to innovative insights into the human brain, fostering a deeper appreciation for the nuanced interplay of sex differences in ASD and beyond.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/PG9CKX/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/PG9CKX/feedback/</feedback_url>
            </event>
            <event guid='1ac5a959-e44d-5c6f-83e8-31e17ad280ba' id='77199' code='CF3VVT'>
                <room>Auditorium 3</room>
                <title>Exploring Eviction Trends in Virginia</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T12:05:00-04:00</date>
                <start>12:05</start>
                <duration>00:30</duration>
                <abstract>Where do landlords engage in more eviction actions? What characteristics of renters or landlords increase the practice of serial filing? There is widespread interest in using administrative data -- information collected by government and agencies in the implementation of public programs -- to evaluate systems and promote most just outcomes. Working with the Civil Court Data Initiative of Legal Services Corporation, we use data collected from civil court records in Virginia to analyze the behavior of landlords. Expanding on our Virginia Evictors Catalog, we use data on court evictions to build additional data tools to support the work of legal and housing advocates and model key eviction outcomes to contribute to our understanding of landlord behavior.</abstract>
                <slug>virginia2025-77199-exploring-eviction-trends-in-virginia</slug>
                <track></track>
                <logo>/media/virginia2025/submissions/CF3VVT/evictions_filed_0VbNf8J_A7PSGEl.png</logo>
                <persons>
                    <person id='78277'>Samantha Toet</person><person id='78545'>Dr. Michele Claibourn</person>
                </persons>
                <language>en</language>
                <description>Virginia is home to 5 of the top 10 cities with the highest rates of eviction nationwide. Housing instability threatens the security of entire communities and burdens already limited social safety nets. Yet research shows that housing instability is rooted not in individual or community failures, but in policies of exclusion, displacement, disinvestment, and discrimination.

While collected to support programmatic goals, administrative data can also be used to shift the lens to those in power. In this work we first visualize eviction activity across the Commonwealth in an interactive Shiny app to address questions and needs of organizations providing legal, policy, and community advocacy. In addition we estimate landlord actions &#8211; eviction filings and serial filings &#8211; as a function of community and landlord characteristics. Using a series of mixed-effects models, with data aggregated to zipcodes nested in counties, we estimate the impact of community characteristics and landlord attributes on the likelihood of eviction filings and nuisance filings.  Both the app and analysis speak to the larger causes and consequences of housing instability.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/CF3VVT/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/CF3VVT/feedback/</feedback_url>
            </event>
            <event guid='661db1df-6324-552c-90c5-4b5ca9eb0473' id='77521' code='NNXPCL'>
                <room>Auditorium 3</room>
                <title>Author Chat &amp; Book Signing</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T12:35:00-04:00</date>
                <start>12:35</start>
                <duration>01:00</duration>
                <abstract>Lunchtime chat with data science authors, with some offering book giveaways and signing books!</abstract>
                <slug>virginia2025-77521-author-chat-book-signing</slug>
                <track></track>
                
                <persons>
                    <person id='78593'>William Ayd</person><person id='78594'>Matthew Topol</person><person id='78344'>Renee Teate</person><person id='78596'>Suhas Pai</person>
                </persons>
                <language>en</language>
                <description>Come meet the authors of some of your favorite data science books, or learn more about a book you&apos;re interested in but haven&apos;t purchased, yet. 

The authors listed below will be available during lunch for informal discussions, so drop in any time during the lunch break for a meet &amp; greet. Some authors will be signing books, so bring your books written by these authors if you want your copy autographed! (And check this schedule again before Friday, as we may have authors joining this session up until the day before the event.) Some limited copies may be available as giveaways.

Will Ayd: Pandas Cookbook, Third Edition (Packt)

Suhas Pai: Designing Large Language Model Applications (O&apos;Reilly)

Renee M. P. Teate: SQL for Data Scientists (Wiley)

Matt Topol: In-Memory Analytics with Apache Arrow (Packt)


Note that author John Berryman will be presenting a tutorial on Saturday, and will be available during lunchtime on Saturday to chat about his book &quot;Prompt Engineering for LLMs: The Art and Science of Building Large Language Model-Based Applications&quot; (O&apos;Reilly).</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/NNXPCL/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/NNXPCL/feedback/</feedback_url>
            </event>
            <event guid='ba6e91e6-f42c-5849-a477-6a6ea82d28d2' id='77498' code='L3GESN'>
                <room>Auditorium 3</room>
                <title>Using Changepoint and Bayesian Analysis to Drive Safety Improvements in Mining</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T14:55:00-04:00</date>
                <start>14:55</start>
                <duration>00:35</duration>
                <abstract>In the mining industry&apos;s pursuit of zero harm, distinguishing real safety improvements from random variation is crucial yet challenging. This talk demonstrates how classical changepoint analysis and Bayesian methods provide safety teams at Asarco LLC with rigorous tools to objectively evaluate progress towards our zero-harm goal. Using near miss reporting and lost time metrics, we will show how these statistical approaches help identify meaningful trends while avoiding misleading conclusions from natural variation. While the focus is on mining, these methods are applicable to other safety-critical and data-limited scenarios. No prior experience with changepoint analysis is required.</abstract>
                <slug>virginia2025-77498-using-changepoint-and-bayesian-analysis-to-drive-safety-improvements-in-mining</slug>
                <track></track>
                
                <persons>
                    <person id='78269'>Mauricio Mathey</person>
                </persons>
                <language>en</language>
                <description>The presentation will cover how changepoint analysis is implemented, how the insights generated are applied to improve the safety metrics, and the challenges we have faced in communicating the insights. It will be structured as follows:
&#8226;	Understanding variability in the process (5 min): How random variation impacts safety metrics and challenges in measuring zero-harm.
&#8226;	Changepoint analysis implementation (10 min): Introduction to changepoint analysis using the changepoint package from R and Bayesian changepoint using the RBeast package from Python.
&#8226;	Communicating the insights (10 min): Challenges in communicating the insights and presenting them in a way that is actionable for the safety team and executives.
&#8226;	Q&amp;A (5-10 min): Open discussion and audience questions.
Attendees will learn:
&#8226;	Why comparing absolute numbers might be misleading.
&#8226;	How to implement changepoint analysis to detect significant changes in safety metrics.
&#8226;	Strategies to communicate actionable findings in non-data science teams and executive level.
This session is ideal for data practitioners with a background in basic probability and statistics (e.g., understanding distributions and confidence intervals). No programming expertise is required, but references to Python libraries and code snippets will provide actionable insights for those looking to implement these techniques in their work.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/L3GESN/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/L3GESN/feedback/</feedback_url>
            </event>
            <event guid='76ce2f38-66e0-5d21-8cb8-a1be04b02b65' id='77461' code='WRJYDF'>
                <room>Auditorium 3</room>
                <title>The Secret Sauce of Customer Satisfaction: Turning Data Pipelines into Data Products</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T15:30:00-04:00</date>
                <start>15:30</start>
                <duration>00:35</duration>
                <abstract>What comes to mind when you think of an exceptional customer experience? Whether it was a &quot;peak experience&quot; or a &quot;dumpster fire&quot;, it stuck with you! We recognize the importance of great customer experiences in industries like retail and hospitality&#8212;but what about in data? Does long-term success depend on creating exceptional customer experiences, or are client expectations just challenges to manage?

In this session we will share insights from a data and analytics project Elder Research is implementing for a Quick-Service Restaurant corporation. By prioritizing the customer experience and embracing a &quot;Data as a Product&quot; mindset, data teams can drive greater business value and build stronger, more sustainable client relationships.</abstract>
                <slug>virginia2025-77461-the-secret-sauce-of-customer-satisfaction-turning-data-pipelines-into-data-products</slug>
                <track></track>
                
                <persons>
                    <person id='78351'>Josh Fairchild</person><person id='78547'>Liam Agnew</person>
                </persons>
                <language>en</language>
                <description>Since 2023, Elder Research has partnered with a major U.S.-based Quick Service Restaurant corporation to enhance the effectiveness of their enterprise data &amp; analytics group. Our goal was to instill a &quot;Data as a Product&quot; mindset across six Data Portfolios, which support internal analytics teams by maintaining core data pipelines for critical business and customer-facing applications.

In this talk, we&#8217;ll share key insights from the work by our technical business analysts and data engineers on this project, highlight the business value delivered to our client, and explore how &quot;Data as a Product&quot; principles can strengthen client relationships for all of us.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/WRJYDF/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/WRJYDF/feedback/</feedback_url>
            </event>
            <event guid='11132e35-4695-59cd-b850-ebbad8601334' id='77128' code='9BTPLD'>
                <room>Auditorium 3</room>
                <title>Machine Learning Pipelines in Higher Education: Lessons Learned Taking Models From Training to Production</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T16:05:00-04:00</date>
                <start>16:05</start>
                <duration>00:35</duration>
                <abstract>Building machine learning models with live, human-centric data is often a messy endeavor. However, by thinking about the entire machine learning pipeline and the lifecycle of the population being modeled we can prevent the model (and data scientist) from overpromising and underdelivering. Come learn about potential pitfalls that occur when working with human-centric data and what you can do to prevent it from ruining your model performance.</abstract>
                <slug>virginia2025-77128-machine-learning-pipelines-in-higher-education-lessons-learned-taking-models-from-training-to-production</slug>
                <track></track>
                
                <persons>
                    <person id='78354'>Brian Richards</person>
                </persons>
                <language>en</language>
                <description>In this talk, we will discuss some lessons learned working on human-centric data in higher education and the pitfalls you may encounter. The higher education student cycle begins with admissions, follows the student throughout the terms they attend, and ideally ends with graduation. Using this student lifecycle as a guide, we will dive into how the data available at each point of the student lifecycle and machine learning pipeline needs to be accounted for during training to prevent failures in production. We will also discuss how working with operational datasets provides unique limits to our models and what to watch out for.

This talk is geared towards a general audience, though familiarity with machine learning will be helpful. 

Outline:

Introduction to the student lifecycle (5 min)

Introduction to machine learning pipelines (5 min)

Working with data from across the student lifecycle (10 min)

Working with operational datasets for a machine learning model (5 min)

Concluding thoughts and Q&amp;A (5 min)</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/9BTPLD/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/9BTPLD/feedback/</feedback_url>
            </event>
            <event guid='80d0599a-4e2f-5e9c-b1e6-649ab62c4c03' id='77411' code='RHCHVC'>
                <room>Auditorium 3</room>
                <title>What is Geometric Algebra and can it help me?</title>
                <subtitle></subtitle>
                <type>Talk</type>
                <date>2025-04-18T16:40:00-04:00</date>
                <start>16:40</start>
                <duration>00:35</duration>
                <abstract>An introduction to Geometric Algebra, with a focus on how it can (and can&apos;t) be used as a practical computational tool in Python. The discussion will present concrete examples which make use of the open source python library &#8216;Kingdon&#8217;. The audience should leave with a grasp of what GA is and what it isn&apos;t,  so that they can decide if it is a tool worthy of their cognitive investment.</abstract>
                <slug>virginia2025-77411-what-is-geometric-algebra-and-can-it-help-me</slug>
                <track></track>
                <logo>/media/virginia2025/submissions/RHCHVC/Screenshot_from_2025-02_wrkv5gr.png</logo>
                <persons>
                    <person id='78357'>Alex Arsenovic</person>
                </persons>
                <language>en</language>
                <description>Geometric Algebra (GA) is a mathematical language that has recently received significant attention from the computer graphics and engineering communities. Proponents of GA claim that it provides a geometrically intuitive interface, concise syntax, and the ability to unify several of the most important algebras. This talk will discuss the pros and cons of GA as a practical computational tool in Python data science. The first half of the talk will introduce the concepts of GA, and the second half will provide concrete demonstrations with the Kingdon library. 
While geared toward data scientists, this talk can be enjoyed by anyone interested in applied mathematics. A basic background in linear algebra will be helpful. Additionally,  those using vector algebra, complex numbers, quaternions, rotation matrices and the like will be especially interested. The audience should leave with a grasp of what GA is and what it isn&apos;t,  so that they can decide if it is a tool worthy of their cognitive investment.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/RHCHVC/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/RHCHVC/feedback/</feedback_url>
            </event>
            
        </room>
        
    </day>
    <day index='2' date='2025-04-19' start='2025-04-19T04:00:00-04:00' end='2025-04-20T03:59:00-04:00'>
        <room name='Room 120' guid='5037d7fa-4529-5052-8aef-8b35200900e0'>
            <event guid='c992a846-52e1-5621-98b9-079469851943' id='77106' code='7EUB8R'>
                <room>Room 120</room>
                <title>Mastering LLMs: From Prompt Engineering to Agentic AI</title>
                <subtitle></subtitle>
                <type>Tutorial</type>
                <date>2025-04-19T09:00:00-04:00</date>
                <start>09:00</start>
                <duration>01:30</duration>
                <abstract>This workshop will provide a comprehensive introduction to Large Language Models (LLMs), covering their capabilities, structure, and practical applications. Participants will learn prompt engineering techniques, retrieval-augmented generation (RAG), agentic AI design, fine-tuning strategies, and model evaluation methods. The session will conclude with a discussion on the future of AI-powered reasoning machines.</abstract>
                <slug>virginia2025-77106-mastering-llms-from-prompt-engineering-to-agentic-ai</slug>
                <track></track>
                
                <persons>
                    <person id='78394'>John Berryman</person>
                </persons>
                <language>en</language>
                <description>The rapid evolution of AI and Large Language Models (LLMs) has opened new possibilities for automation, content generation, and interactive agents. This hands-on workshop is designed for developers, researchers, and AI enthusiasts who want to deepen their understanding of LLMs and learn how to harness their full potential. Topics covered include:
- How LLMs work and the role of reinforcement learning in training
- The art and science of prompt engineering, including zero-shot and few-shot techniques
- Retrieval-Augmented Generation (RAG) for integrating external knowledge
- Agentic AI: Designing chatbots and workflow agents
- Fine-tuning models using LoRA for custom behaviors
- Evaluation methods for improving AI performance
- Future trends, including multimodal models and new interaction paradigms
Attendees will leave with practical skills, implementation strategies, and insights into the future of AI-powered applications.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/7EUB8R/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/7EUB8R/feedback/</feedback_url>
            </event>
            <event guid='49fc7589-8ab0-538d-a75d-0d71572f5f39' id='77470' code='XPFPFE'>
                <room>Room 120</room>
                <title>Building Rich RAG Systems with Docling: Unlock Information from Tables, Images, and Complex Documents</title>
                <subtitle></subtitle>
                <type>Tutorial</type>
                <date>2025-04-19T11:00:00-04:00</date>
                <start>11:00</start>
                <duration>01:30</duration>
                <abstract>Traditional PDF extraction tools often struggle with complex layouts, tables, and images, Docling (an opensource Python library developed at IBM) excels at extracting structured information from these elements, enabling the creation of richer, more accurate vector databases. This hands-on tutorial will guide participants through building a Retrieval Augmented Generation (RAG) system using Docling, an open-source document processing library.


Participants will learn how to harness Docling&apos;s advanced capabilities to build superior RAG systems that can understand and retrieve information from complex document elements that traditional tools might miss. Participants will learn how to handle complex documents, extract structured information, and create an efficient vector database for semantic search. The session will cover best practices for document parsing, chunking strategies, and integration with popular LLM frameworks.</abstract>
                <slug>virginia2025-77470-building-rich-rag-systems-with-docling-unlock-information-from-tables-images-and-complex-documents</slug>
                <track></track>
                <logo>/media/virginia2025/submissions/XPFPFE/Talk-flow-Diagram-V4_23_3dAtuJe.png</logo>
                <persons>
                    <person id='78338'>Krishna Rekapalli</person>
                </persons>
                <language>en</language>
                <description>### Overview and Objectives
This tutorial leverages Docling (https://ds4sd.github.io/docling/), a powerful open-source library designed for advanced document processing and AI integration. The session aims to equip data scientists and ML engineers with practical skills for building robust RAG systems by utilizing Docling&apos;s comprehensive feature set. We will work through scenarios with multi-page tables, research paper processing maintaining multi-column layouts and equations, or technical documentation management that understands code blocks and diagrams. Through these examples, you&apos;ll gain practical experience in building robust document processing pipelines that outperform traditional extraction tools.

Participants will learn how to:
- Process and parse various document formats (PDF, DOCX, HTML) using Docling
- Extract structured information including tables, formulas, and images
- Implement effective text chunking strategies for optimal retrieval
- Create vector databases for semantic search
- Integrate the pipeline with LLM frameworks for end-to-end RAG solutions

### Target Audience
This tutorial is designed for:
- Data scientists and ML engineers working on document processing and LLM applications
- Software developers implementing RAG systems
- Anyone interested in building production-ready document processing pipelines

**Experience Level:** Intermediate

**Prerequisites:**
- Basic Python programming knowledge
- Familiarity with basic NLP concepts
- Understanding of LLMs and vector databases (basic level)

### Technical Requirements
Participants should have:
- Python 3.10 or 3.11 installed
- A code editor or IDE
- Ability to install Python packages via pip
- 4GB+ of free disk space for models and dependencies

### Detailed Outline (90 minutes)

1. Introduction and Setup (15 minutes)
   - RAG system architecture overview
   - Setting up the development environment
   - Installing Docling and dependencies


2. Document Processing with Docling (25 minutes)
   - Understanding Docling&apos;s document processing capabilities
   - Comparing traditional PDF extraction vs. Docling&apos;s advanced parsing
   - Advanced extraction of tables, images, and complex layouts
   - Hands-on exercise: Processing sample documents with rich content


3. Building the RAG Pipeline (25 minutes)
   - Creating rich vector embeddings that preserve document structure
   - Integration with LLM frameworks
   - Hands-on exercise: Building a complete RAG pipeline


4. Best Practices and Production Considerations (15 minutes)
   - Performance optimization techniques
   - Using accelerators 
   - Docling-serve https://github.com/docling-project/docling-serve to deploy Docling as API service
   - Creating effective evaluations



5. Q&amp;A and Interactive Problem Solving (10 minutes)
   - Addressing participant questions
   - Troubleshooting common issues
   - Discussion of real-world applications


### Materials
https://github.com/KrishnaRekapalli/docling-rag-tutorial-pydata-2025

### Pre-work
Make sure that you have a Hugging Face access token / Replicate API key for LLM inference. You can get some free inference credit on both platforms without credit card. Other option is local ollama. For more details check https://github.com/KrishnaRekapalli/docling-rag-tutorial-pydata-2025

### Key Takeaways
Participants will leave the tutorial with:
- Practical experience in building RAG systems
- Understanding of document processing best practices
- Ability to extract and utilize information from complex document elements
- Hands-on experience comparing traditional vs. advanced extraction methods
- Knowledge of common pitfalls and how to avoid them
- Strategies for handling tables and images in RAG systems</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/XPFPFE/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/XPFPFE/feedback/</feedback_url>
            </event>
            <event guid='bf321c2c-fbdb-5eba-ab54-1a21925ed30a' id='77472' code='3JXT7N'>
                <room>Room 120</room>
                <title>Build Your Own Data Science AI Agents</title>
                <subtitle></subtitle>
                <type>Tutorial</type>
                <date>2025-04-19T13:30:00-04:00</date>
                <start>13:30</start>
                <duration>01:30</duration>
                <abstract>When &#8220;AI Agent&#8221; became the buzz word, have you ever wondered: what exactly is an AI agent? What is the multi-agent system? And how can you use the power of AI agents in your **day-to-day data science workflow**? In this hands-on tutorial, I will introduce AI agents and demonstrate how to design, build, and manage a multi-agent system for your data science workflows. Participants will learn how to break down complex tasks, assign AI agents to collaborate effectively, and ensure accuracy and reliability in their outputs. We will also discuss the trade-offs, limitations, and best practices for incorporating AI agents into data science projects.</abstract>
                <slug>virginia2025-77472-build-your-own-data-science-ai-agents</slug>
                <track></track>
                
                <persons>
                    <person id='78617'>Niharika Krishnan</person><person id='78361'>Chuxin Liu</person><person id='78379'>Astha Puri</person><person id='78537'>Michelle Rojas</person>
                </persons>
                <language>en</language>
                <description>**Prerequisite**: 
1. OpenAI developer API Key. If you do not have one, here is a video to create an account and create the OpenAI API Key. https://www.youtube.com/watch?v=JuAOOO18ycg
2. LangSmith API: https://smith.langchain.com/


**Tutorial Materials**: find this Google Drive link: https://drive.google.com/drive/folders/1keoQYO6iEm_b9olxxcWgOfmpipProaPJ?usp=drive_link

This hands-on tutorial will guide participants through designing, building, and deploying AI agents to streamline data science tasks.

**What You&#8217;ll Learn**
This tutorial will provide a deep dive into AI agents and multi-agent systems, covering:
- The role of AI agents in automating data science tasks such as data preprocessing, feature engineering, model selection, and evaluation.
- How to design a multi-agent system that efficiently distributes tasks while ensuring reliability and accuracy.
- Strategies for incorporating AI agents into everyday workflows to save time and enhance productivity.
- Common challenges, trade-offs, and best practices when using AI agents in data science.

**Tutorial Structure**
1. Introduction to AI Agents in Data Science (15 minutes)
- What are AI agents, and how do they fit into data science workflows?
- Examples of AI-driven automation in data science.
- Overview of multi-agent collaboration for data-related tasks.
2. Setting Up the Development Environment (10 minutes)
- Tools and frameworks for building AI agents in data science.
- Accessing tutorial materials (Google Drive).
3. Building an AI-Driven Data Science Workflow (40 minutes)
- Hands-on implementation: Automating exploratory data analysis (EDA), data preprocessing, model training, and evaluation with AI agents.
- Orchestrating agent collaboration for complex workflows.
- Ensuring accuracy, reliability, and interpretability in AI-assisted data tasks.
4. Challenges, Trade-offs, and Best Practices (15 minutes)
5. Q&amp;A and Wrap-Up (10 minutes)
- Discussion on real-world applications and industry adoption.
- Key takeaways and next steps for implementing AI agents in data projects.

**Who Should Attend?**
This tutorial is designed for data analysts, data scientists, machine learning practitioners, and AI engineers looking to integrate AI agents into their workflows. Attendees should have a basic understanding of Python and machine learning concepts. 

**Prerequisites &amp; Materials**
- Skill Level: Intermediate (basic Python and ML knowledge recommended).
- Resources: A Google Colab environment for hands-on execution (no local installation required).

By the end of this tutorial, participants will have a practical framework for using AI agents to automate and optimize data science workflows, improving efficiency and scalability in their projects.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/3JXT7N/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/3JXT7N/feedback/</feedback_url>
            </event>
            <event guid='41d015f9-1411-5abd-a0b0-2a7e3002e004' id='77491' code='LMBBBF'>
                <room>Room 120</room>
                <title>Blazing the AI Trail: Using LangGraph to Conquer the Oregon Trail</title>
                <subtitle></subtitle>
                <type>Tutorial</type>
                <date>2025-04-19T15:30:00-04:00</date>
                <start>15:30</start>
                <duration>01:30</duration>
                <abstract>Agents have become one of the most talked-about topics in the AI community, but much of the discussion focuses on its potential impact rather than practical implementation. This hands-on workshop will guide data scientists and engineers through building a complete workflow using LangGraph, and will show how to define custom tools, implement vector retrieval, leverage semantic caching, incorporate allow/block list routing, and structure model output for downstream consumption. In order to participate, attendees will need to have python (&gt;=3.11), docker, an OpenAI api key, and the starter code for the project cloned.

**Starter code**: https://github.com/redis-developer/agents-redis-lang-graph-workshop

**Note**: participants can test their environment setup ahead of time by following the Readme and running `python test_setup.py` before the workshop.</abstract>
                <slug>virginia2025-77491-blazing-the-ai-trail-using-langgraph-to-conquer-the-oregon-trail</slug>
                <track></track>
                
                <persons>
                    <person id='78314'>Robert Shelton</person>
                </persons>
                <language>en</language>
                <description>Despite the growing excitement around AI agents, many practitioners lack clear guidance on how to implement them effectively. This workshop aims to bridge that gap by providing a structured, hands-on approach to building AI agent workflows with LangGraph. Participants will create an agent capable of playing the Oregon Trail and making in-game decisions, illustrating in a fun way not only how to implement agents but also when, why, and for what sorts of problems. 

Session outline:
1. **Understanding Agent Workflows (10 min)**
    - Overview of agentic workflows and their importance
    - When and why to build agent workflows
2. **Building a Basic LangGraph Agent (20 min)**
    - Setting up the LangGraph framework
    - Defining discrete operations with custom tools
3. **Enhancing Agent Capabilities (20 min)**
    - Structuring output for API interactions
    - Implementing vector retrieval for RAG to improve contextual responses
4. **Optimizing for Performance and Control (25 min)**
    - Creating a semantic cache to reduce LLM latency and cost
    - Implementing allow/block list routing for controlled execution
5. **Review and Discuss (15 min)**
    - Review what was just accomplished and why
    - Discuss any design challenges or open debugging questions
    - Open Q&amp;A for questions related to best practice

This workshop has been tested with participants at a variety of levels and typically takes ~60 minutes to complete if environment setup has been confirmed as noted above.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links>
                    <link href="https://github.com/redis-developer/agents-redis-lang-graph-workshop">Starter code</link>
                </links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/LMBBBF/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/LMBBBF/feedback/</feedback_url>
            </event>
            
        </room>
        <room name='Room 130' guid='b0c76a60-b6a7-590f-8dfa-5cd4772a399c'>
            <event guid='3091d893-4c2b-5e86-920f-4066244f9840' id='77069' code='HNWLPV'>
                <room>Room 130</room>
                <title>Responsible AI with SciPy</title>
                <subtitle></subtitle>
                <type>Tutorial</type>
                <date>2025-04-19T09:00:00-04:00</date>
                <start>09:00</start>
                <duration>01:30</duration>
                <abstract>SciPy is a powerful library for scientific and technical computing in Python. The primary objectives of this presentation are to explore the core concepts of Responsible AI and to demonstrate these concepts with SciPy.</abstract>
                <slug>virginia2025-77069-responsible-ai-with-scipy</slug>
                <track></track>
                
                <persons>
                    <person id='78261'>Andrea Hobby</person>
                </persons>
                <language>en</language>
                <description>The tutorial provides an introduction to Responsible AI using SciPy.

This presentation will begin with an overview of Responsible AI concepts and of SciPy&apos;s core features. Following this, there will be a tutorial on how to implement Responsible AI concepts in SciPy. 

The following items will be covered during the tutorial. 

- Data Processing and Validation 
- Bias Detection and Mitigation 
- Sensitivity Analysis 
- Explainability and Transparency 

Each topic will be demonstrated with examples, including links to extended tutorials featuring real-world applications from the healthcare industry.

By the end of this session, attendees will have a solid understanding of how to use SciPy for Responsible AI Applications. Additionally, they will be able to apply these concepts to their own projects immediately.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/HNWLPV/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/HNWLPV/feedback/</feedback_url>
            </event>
            <event guid='73f7add2-3522-5c16-b8e6-b18d88d48dad' id='77489' code='RQCCPA'>
                <room>Room 130</room>
                <title>Data Viz in Python as a Tool to Study HIV Health Disparities</title>
                <subtitle></subtitle>
                <type>Tutorial</type>
                <date>2025-04-19T11:00:00-04:00</date>
                <start>11:00</start>
                <duration>01:30</duration>
                <abstract>Health disparities remain a critical challenge in public health, demanding innovative approaches to uncover inequities and drive actionable change. This webinar will demonstrate how Python can serve as a powerful tool for creating data visualizations that illustrate the unequal burden of HIV across different populations. Participants will learn how Python&#8217;s popular libraries, such as Matplotlib, Seaborn, and Plotly, can transform complex datasets into accessible, impactful visuals.
Using an HIV dataset containing demographic, geographic, and clinical variables, this session will guide attendees through a series of practical examples. From creating heatmaps and geospatial maps to analyzing temporal trends, the webinar emphasizes how to identify and communicate key social determinants related to race, gender, socioeconomic status, and access to care. Through hands-on demonstrations, attendees will see how Python&#8217;s capabilities streamline data analysis and visualization workflows.
Key takeaways from the session include identifying regions and communities in Texas, disproportionately affected by HIV, uncovering intersectional factors influencing health outcomes, and leveraging visual tools to inform policy and resource allocation. Special attention will be given to designing visuals that resonate with non-technical audiences, ensuring findings are actionable for public health professionals and policymakers.</abstract>
                <slug>virginia2025-77489-data-viz-in-python-as-a-tool-to-study-hiv-health-disparities</slug>
                <track></track>
                <logo>/media/virginia2025/submissions/RQCCPA/Dr._Kimberly_Deas_PyCon_qp84OZx.png</logo>
                <persons>
                    <person id='78240'>Dr. Kimberly Deas</person>
                </persons>
                <language>en</language>
                <description>Description: Data Viz in Python as a Tool to Study Health Disparities

Targeted to the intermediate Python user, this session will begin with a brief overview of the tools and libraries that will be used, such as Pandas, Matplotlib, Seaborn, Plotly, and GeoPandas. Participants will do hands-on coding, exploring how to transform secondary data into practical, professional visuals. Key coding topics include:
1.	Data Preprocessing and Exploration:
o	Advanced techniques in Pandas for cleaning and reshaping datasets, including handling missing data and filtering key variables.
o	Conducting exploratory data analysis (EDA) to uncover trends and patterns related to HIV disparities.

2.	Building Complex Visualizations:
o	Heatmaps with Seaborn to visualize correlations between demographic factors and health outcomes.
o	Geospatial maps using GeoPandas and Plotly to pinpoint regions with high HIV prevalence and disparities in care access.
o	Bar plots, stacked charts, and histograms to analyze outcomes across intersectional demographics.
o	Time series plots using Matplotlib and Seaborn to explore temporal changes in HIV rates and interventions.

3.	Next Steps:
o	Share Findings with Stakeholders: Present the visualizations and key insights to relevant stakeholders, such as public health officials, policymakers, healthcare providers, and community organizations, using clear and actionable language.
o	Develop Targeted Interventions: Use the insights from the analysis to design and propose interventions aimed at addressing identified disparities, such as community outreach programs, resource allocation strategies, or policy changes.
o	Monitor and Evaluate Impact: Implement a plan to track the effectiveness of interventions using measurable outcomes, such as reductions in infection rates or improvements in access to care, and iterate on strategies based on the results.
o	Build Collaborative Partnerships: Partner with community organizations, research institutions, and funding agencies to amplify efforts, secure resources, and ensure sustained action to address health disparities over time.

This session will emphasize practical, hands-on coding, and participants are encouraged to follow along to develop scripts they can apply to their own datasets. By the end of the webinar, attendees will have a deeper understanding of how to use Python for data visualization and actionable insights in public health.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/RQCCPA/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/RQCCPA/feedback/</feedback_url>
            </event>
            <event guid='2497cab6-ed2e-508d-8e4f-b3009808eaf1' id='77215' code='WAWAHD'>
                <room>Room 130</room>
                <title>Getting Started with RAPIDS: GPU-Accelerated Data Science for PyData Users</title>
                <subtitle></subtitle>
                <type>Tutorial</type>
                <date>2025-04-19T13:30:00-04:00</date>
                <start>13:30</start>
                <duration>01:30</duration>
                <abstract>In this introductory hands-on tutorial, participants will learn how to accelerate their data workflows with [RAPIDS](https://rapids.ai/), an open-source suite of libraries designed to leverage the power of [NVIDIA](https://www.nvidia.com/) GPUs for end-to-end data pipelines. Using familiar PyData APIs like **cuDF** (GPU-accelerated pandas) and **cuML** (GPU-accelerated machine learning), attendees will explore how to seamlessly integrate these tools into their existing workflows with minimal code changes, achieving significant speedups in tasks such as data processing and model training.</abstract>
                <slug>virginia2025-77215-getting-started-with-rapids-gpu-accelerated-data-science-for-pydata-users</slug>
                <track></track>
                
                <persons>
                    <person id='78259'>Naty Clementi</person><person id='78260'>Mike McCarty</person>
                </persons>
                <language>en</language>
                <description>[NVIDIA](https://www.nvidia.com/) GPUs offer unmatched speed and efficiency for data processing and model training, significantly reducing the time and cost associated with these tasks. The appeal of GPUs becomes even stronger with zero-code-change libraries and plugins, allowing you to take advantage of GPU acceleration without having to rewrite your existing code. With [RAPIDS](https://rapids.ai/), you can use popular PyData libraries like **pandas**, **polars**, and **networkx** while reaping the performance benefits of GPUs.

This tutorial provides an introduction to **RAPIDS**, an open-source suite of libraries that accelerates data science and machine learning workflows using GPU technology. Aimed at data scientists and machine learning practitioners of all experience levels, the session will focus on how RAPIDS can be seamlessly integrated into existing data pipelines to achieve substantial performance improvements with minimal code changes.

Through hands-on coding exercises, attendees will explore the RAPIDS ecosystem, including **cuDF** (GPU-accelerated pandas) and **cuML** (GPU-accelerated machine learning), and learn how to integrate these tools into their workflows to accelerate tasks like data processing and model training. By the end of this tutorial, they&apos;ll understand how RAPIDS integrates with the PyData ecosystem and significantly speed up workflows, 

The target audience for this tutorial is data scientists and machine learning practitioners. No prior GPU knowledge is required, but participants should have some experience with Python, pandas, and scikit-learn.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/WAWAHD/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/WAWAHD/feedback/</feedback_url>
            </event>
            <event guid='8bb77f1d-2e5f-5c86-be49-f12b558378aa' id='77487' code='B9RT3L'>
                <room>Room 130</room>
                <title>From Pandas to PySpark</title>
                <subtitle></subtitle>
                <type>Tutorial</type>
                <date>2025-04-19T15:30:00-04:00</date>
                <start>15:30</start>
                <duration>01:30</duration>
                <abstract>Tired of waiting for massive datasets to load on your local machine? In this beginner-friendly tutorial, we&#8217;ll explore how to scale your data analysis skills from pandas to PySpark using a real-world anime dataset. We&#8217;ll walk through the basics of distributed computing, discuss why Spark was created, and demonstrate the benefits of working with PySpark for big data tasks&#8212;including reading, cleaning, and transforming millions of records with ease. By the end of this workshop, you&#8217;ll understand how PySpark harnesses cluster computing to handle large-scale data and you&#8217;ll be comfortable applying these techniques to your own projects.

Participant Requirements:
- A laptop (any OS) with an internet connection
- A Google account (to access Colab notebooks and slides)
- Familiarity with Python and pandas

Here&apos;s the link to the Google Colab to follow along &#128071;&#127998;
https://colab.research.google.com/drive/1fi0cTQ1NIE5kDEH0ynp2sqDuVeiBJJWU?usp=sharing

Here are the slides &#128071;&#127998;
https://drive.google.com/file/d/11JIih1VzLxTJ9O6PeGzqD_e8vumTZQmw/view?usp=sharing</abstract>
                <slug>virginia2025-77487-from-pandas-to-pyspark</slug>
                <track></track>
                <logo>/media/virginia2025/submissions/B9RT3L/chris-curry-GYpsSWHslHA_gF0xXH5.jpg</logo>
                <persons>
                    <person id='78615'>Cynthia Ukawu</person>
                </persons>
                <language>en</language>
                <description>This tutorial aims to close the gap between small-scale data analysis and big data processing. If you&#8217;ve ever tried to load a multi-gigabyte CSV into pandas or Excel, you know the frustration of crashing programs and endless waits. This tutorial shows how to level up your data skills using PySpark&#8217;s distributed DataFrame API.

We&#8217;ll do more than just introduce Spark concepts&#8212;we&#8217;ll work through a lively anime dataset full of ratings, genres, and user insights, so you can see how PySpark handles real-world tasks (like filtering, grouping, and joining) at scale. You&#8217;ll get comfortable with Spark&#8217;s architecture and learn how it uses lazy evaluations, cluster computing, and in-memory operations to achieve speedups. One highlight of the workshop is its hands-on approach: all exercises will be run in Google Colab. That means zero friction in setup&#8212;no cluster installation or environment wrangling. We&#8217;ll walk through the entire pipeline: loading massive CSV files, performing transformations that mirror pandas operations, and drawing insights through SQL-like queries.

Expect a fast-paced but accessible look at Spark&#8217;s key features, practical code examples, and best practices to keep your big data workflows efficient and transparent.

Tutorial Outline
- Why Spark?: A short overview of Hadoop MapReduce and how Spark rose to address its shortcomings.
- Distributed Data 101: Breaking down Spark&#8217;s architecture, executors, and lazy evaluation.
- Hands-On Setup: Launching PySpark in Google Colab so everyone can follow along in real time.
- Exploring the Anime Dataset: Reading data from CSV, structuring DataFrames, and performing data cleaning.
- Common Operations at Scale: Filtering, grouping, and aggregating millions of rows with PySpark.
- Comparisons to Pandas: Mapping familiar DataFrame operations to their Spark counterparts.
- Final Thoughts: Discussion of where Spark fits into modern data stacks, plus pointers for advanced usage (MLlib, streaming, cluster optimization).</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/B9RT3L/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/B9RT3L/feedback/</feedback_url>
            </event>
            
        </room>
        <room name='Room 140' guid='79090300-67a8-5f5d-9aa9-14d91d353fad'>
            <event guid='b9bc684d-259e-5757-9a84-5934277b839c' id='77502' code='SHTFQY'>
                <room>Room 140</room>
                <title>Tutorial on Image Classification using Scikit-Image, Scikit-learn, and PyTorch</title>
                <subtitle></subtitle>
                <type>Tutorial</type>
                <date>2025-04-19T09:00:00-04:00</date>
                <start>09:00</start>
                <duration>01:30</duration>
                <abstract>Tutorial on building an image segmentation and classification pipeline for binary or multiclass classification using the popular packages scikit-learn, scikit-image and PyTorch.</abstract>
                <slug>virginia2025-77502-tutorial-on-image-classification-using-scikit-image-scikit-learn-and-pytorch</slug>
                <track></track>
                <logo>/media/virginia2025/submissions/SHTFQY/image_class_tutorial_LY_2jsFsVZ.png</logo>
                <persons>
                    <person id='78382'>Matt Litz</person>
                </persons>
                <language>en</language>
                <description>Welcome to the exciting world of computer vision and machine learning!  This tutorial presents foundational computer vision operations to prepare you to build your first successful classification pipeline.  My goal is to help guide you past potential pitfalls and present topics for consideration as you embark on your machine learning journey.

1. Computer Vision Basics
   * The Basics
   * Software and Packages
2. Image Segmentation
   * Preprocessing (histograms, filters)
   * Thresholding
   * Morphological Operators
   * Advanced Segmentation
3. Feature Extraction
   * Textures
      * GLCM
      * LBP
4. Model Development - scikit-learn
   * scikit-learn
       * Gaussian Process
5. Feature Importance
   * Shapley
6. Neural Networks - PyTorch
7.  Model Development
    * CNN
    * Transfer Learning
8. Model Performance
   * Tensorboard
   * Saliency map

Notebooks will be available prior to the start of the tutorial.  Please come prepared with the following python packages installed:
* numpy
* pandas
* scikit-learn
* scikit-image 
* torch
* torchvision
* pytensorboard</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links>
                    <link href="https://github.com/mattlitz/pydata-virginia2025-image-clf.git">Github repo for tutorial files</link>
                </links>
                <attachments>
                    <attachment href="https://cfp.pydata.org/media/virginia2025/submissions/SHTFQY/resources/pydata-virgi_iF5fcnB.pptx">Presentation</attachment>
                </attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/SHTFQY/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/SHTFQY/feedback/</feedback_url>
            </event>
            <event guid='e7eb0b37-2575-50df-850d-45501f6056c9' id='77114' code='GYFR7G'>
                <room>Room 140</room>
                <title>A Beginner&apos;s Guide to Variational Inference</title>
                <subtitle></subtitle>
                <type>Tutorial</type>
                <date>2025-04-19T11:00:00-04:00</date>
                <start>11:00</start>
                <duration>01:30</duration>
                <abstract>When Bayesian modeling scales up to large datasets, traditional MCMC methods can become impractical due to their computational demands. **Variational Inference (VI) offers a scalable alternative**, trading exactness for speed while retaining the essence of Bayesian inference. 

In this tutorial, we&#8217;ll **explore how to implement and compare VI techniques in PyMC**, including the Adaptive Divergence Variational Inference (ADVI) and the cutting-edge Pathfinder algorithm.

Starting with simple models like linear regression, we&#8217;ll gradually introduce more **complex, real-world applications**, comparing the performance of VI against Markov Chain Monte Carlo (MCMC) to understand the trade-offs in speed and accuracy. 

**This tutorial will arm participants with practical tools to deploy VI in their workflows** and help answer pressing questions, like &quot;*What do I do when MCMC is too slow?*&quot;, or &quot;*How does VI compare to MCMC in terms of approximation quality?*&quot;.</abstract>
                <slug>virginia2025-77114-a-beginner-s-guide-to-variational-inference</slug>
                <track></track>
                
                <persons>
                    <person id='78265'>Chris Fonnesbeck</person>
                </persons>
                <language>en</language>
                <description>## Description

This tutorial is **for data scientists, statisticians, and machine learning practitioners who are comfortable with Python and basics of probability**.

We&#8217;ll break down the mechanics of VI and its application in PyMC in an approachable way, starting with intuitive explanations and building up to practical examples.

Participants will learn how to apply ADVI and Pathfinder in PyMC and evaluate their results against MCMC, gaining insights into when and why to choose VI.

### Takeaways

Participants will leave understanding:

- The fundamentals of VI and how it differs from MCMC.
- How to implement ADVI and Pathfinder in PyMC.
- Practical considerations when selecting and evaluating inference methods.

### Background Knowledge Required

- Basic understanding of probability and Bayesian inference.
- Familiarity with Python. Prior PyMC experience is helpful but not required.

### Materials Distribution

All materials, including notebooks and datasets, will be available on GitHub.

## Outline

1. **Introduction: Why Variational Inference?** (10 min)
- The limitations of MCMC for large datasets.
- Overview of VI: How it works and why it&#8217;s faster.

2. **Variational Inference Basics** (20 min)
- Key concepts: Evidence Lower Bound (ELBO), optimization, and approximation families.
- Intuitive explanation of ADVI and Pathfinder.

3. **Implementing VI with PyMC** (15 min)
- Step-by-step walkthrough of VI with a linear model.
- Comparing ADVI, Pathfinder, and MCMC.

4. **Evaluating VI Approximations** (10 min)
- How to measure the quality of VI approximations (ELBO, simulation-based calibration, etc.).
- Practical trade-offs between speed and accuracy.

5. **Scaling Up: Complex Models and Real-World Applications** (25 min)
- Applying VI to hierarchical and large-scale models.
- Tips for debugging and optimizing VI workflows.

6. **Open Discussion and Q&amp;A** (10 min)
- Address audience-specific use cases and questions.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/GYFR7G/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/GYFR7G/feedback/</feedback_url>
            </event>
            <event guid='45db58a9-e251-5eb3-bc7a-73d0cc823807' id='77476' code='WZKH8G'>
                <room>Room 140</room>
                <title>Introduction to Wikidata</title>
                <subtitle></subtitle>
                <type>Tutorial</type>
                <date>2025-04-19T13:30:00-04:00</date>
                <start>13:30</start>
                <duration>01:30</duration>
                <abstract>We will review Wikipedia, introduce Wikidata, then demonstrate queries to access wiki content</abstract>
                <slug>virginia2025-77476-introduction-to-wikidata</slug>
                <track></track>
                <logo>/media/virginia2025/submissions/WZKH8G/Wikipedia-logo-v2-wordm_aK7DOJf.png</logo>
                <persons>
                    <person id='78374'>Lane Rasberry</person><person id='78387'>Robin Isadora Brown</person>
                </persons>
                <language>en</language>
                <description>Wikipedia is the general reference source for humans to read. Wikidata is its interconnected, structured data complement, and accessible through queries. We will consider Wikidata&apos;s purpose, scope, and editorial community, then query for interesting results in pop culture, science, civics, and more. Attendees will learn how to access sample queries including through Jupyter Notebooks.</description>
                <recording>
                    <license></license>
                    <optout>false</optout>
                </recording>
                <links></links>
                <attachments></attachments>

                <url>https://cfp.pydata.org/virginia2025/talk/WZKH8G/</url>
                <feedback_url>https://cfp.pydata.org/virginia2025/talk/WZKH8G/feedback/</feedback_url>
            </event>
            
        </room>
        
    </day>
    
</schedule>
