PyData Virginia 2025

Build Your Own Data Science AI Agents
04-19, 13:30–15:00 (US/Eastern), Room 120

When “AI Agent” became the buzz word, have you ever wondered: what exactly is an AI agent? What is the multi-agent system? And how can you use the power of AI agents in your day-to-day data science workflow? In this hands-on tutorial, I will introduce AI agents and demonstrate how to design, build, and manage a multi-agent system for your data science workflows. Participants will learn how to break down complex tasks, assign AI agents to collaborate effectively, and ensure accuracy and reliability in their outputs. We will also discuss the trade-offs, limitations, and best practices for incorporating AI agents into data science projects.


Prerequisite:
1. OpenAI developer API Key. If you do not have one, here is a video to create an account and create the OpenAI API Key. https://www.youtube.com/watch?v=JuAOOO18ycg
2. LangSmith API: https://smith.langchain.com/

Tutorial Materials: find this Google Drive link: https://drive.google.com/drive/folders/1keoQYO6iEm_b9olxxcWgOfmpipProaPJ?usp=drive_link

This hands-on tutorial will guide participants through designing, building, and deploying AI agents to streamline data science tasks.

What You’ll Learn
This tutorial will provide a deep dive into AI agents and multi-agent systems, covering:
- The role of AI agents in automating data science tasks such as data preprocessing, feature engineering, model selection, and evaluation.
- How to design a multi-agent system that efficiently distributes tasks while ensuring reliability and accuracy.
- Strategies for incorporating AI agents into everyday workflows to save time and enhance productivity.
- Common challenges, trade-offs, and best practices when using AI agents in data science.

Tutorial Structure
1. Introduction to AI Agents in Data Science (15 minutes)
- What are AI agents, and how do they fit into data science workflows?
- Examples of AI-driven automation in data science.
- Overview of multi-agent collaboration for data-related tasks.
2. Setting Up the Development Environment (10 minutes)
- Tools and frameworks for building AI agents in data science.
- Accessing tutorial materials (Google Drive).
3. Building an AI-Driven Data Science Workflow (40 minutes)
- Hands-on implementation: Automating exploratory data analysis (EDA), data preprocessing, model training, and evaluation with AI agents.
- Orchestrating agent collaboration for complex workflows.
- Ensuring accuracy, reliability, and interpretability in AI-assisted data tasks.
4. Challenges, Trade-offs, and Best Practices (15 minutes)
5. Q&A and Wrap-Up (10 minutes)
- Discussion on real-world applications and industry adoption.
- Key takeaways and next steps for implementing AI agents in data projects.

Who Should Attend?
This tutorial is designed for data analysts, data scientists, machine learning practitioners, and AI engineers looking to integrate AI agents into their workflows. Attendees should have a basic understanding of Python and machine learning concepts.

Prerequisites & Materials
- Skill Level: Intermediate (basic Python and ML knowledge recommended).
- Resources: A Google Colab environment for hands-on execution (no local installation required).

By the end of this tutorial, participants will have a practical framework for using AI agents to automate and optimize data science workflows, improving efficiency and scalability in their projects.


Prior Knowledge Expected

No previous knowledge expected

Astha is a Senior Data Scientist at CVS Health, where she leads the design of recommendation engines for digital platforms, helping customers discover the right products and enabling patients to access the appropriate health services and support. She specializes in home screen personalization, leveraging data-driven insights to enhance user experiences. With a strong background in the tech industry, she is now applying her expertise to transform and innovate within the healthcare sector.

Chuxin Liu, PhD holds a PhD in economics and current works as a senior quantitative associate at JPMorgan. Chuxin has worked with leading companies in the field and has been an active member in the data communities. She is the chapter lead for AICamp and an ambassador for Women in Data Science (WiDS). Chuxin's teaching experience spans from institutions like City University of New York to public conferences including Pydata.

Niharika is a Machine Learning Engineer in NYC, working at the intersection of Quant Finance and AI. As a PyLadies organizer and WiDS ambassador, she fosters a community of women in NYC to collaborate and grow in the field of AI.