PyData Virginia 2025

Machine Learning Pipelines in Higher Education: Lessons Learned Taking Models From Training to Production
04-18, 16:05–16:40 (US/Eastern), Auditorium 3

Building machine learning models with live, human-centric data is often a messy endeavor. However, by thinking about the entire machine learning pipeline and the lifecycle of the population being modeled we can prevent the model (and data scientist) from overpromising and underdelivering. Come learn about potential pitfalls that occur when working with human-centric data and what you can do to prevent it from ruining your model performance.


In this talk, we will discuss some lessons learned working on human-centric data in higher education and the pitfalls you may encounter. The higher education student cycle begins with admissions, follows the student throughout the terms they attend, and ideally ends with graduation. Using this student lifecycle as a guide, we will dive into how the data available at each point of the student lifecycle and machine learning pipeline needs to be accounted for during training to prevent failures in production. We will also discuss how working with operational datasets provides unique limits to our models and what to watch out for.

This talk is geared towards a general audience, though familiarity with machine learning will be helpful.

Outline:

Introduction to the student lifecycle (5 min)

Introduction to machine learning pipelines (5 min)

Working with data from across the student lifecycle (10 min)

Working with operational datasets for a machine learning model (5 min)

Concluding thoughts and Q&A (5 min)


Prior Knowledge Expected

No previous knowledge expected

Brian Richards is a Senior Data Scientist at HelioCampus and works with data across the higher education student lifecycle to help colleges and universities better understand their students and support them through graduation. Brian also has an interest in exploring model evaluation techniques and helping end users better understand how their models work.