Designing tests for ML libraries – lessons from the wild :: PyData Amsterdam 2025

Designing tests for ML libraries – lessons from the wild
.ical
2025-09-25 13:40–14:30, Voyager

In this talk, we will cover how to write effective test cases for machine learning (ML) libraries that are used by hundreds of thousands of users on a regular basis. Tests, despite their well-established need for trust and foolproofing, often get less prioritized. Later, this can wreak havoc on massive codebases, with a high likelihood of introducing breaking changes and other unpleasant situations. This talk deals with our approach to testing our ML libraries, which serve a wide user base. We will cover a wide variety of topics, including the mindset and the necessity of minimal-yet-sufficient testing, all the way up to sharing some practical examples of end-to-end test suites.

Why revisit an established topic?
- How do ML libraries differ from regular Python libraries and how does it impact their testing?
Types of ML Libraries
- Platform-level libraries (PyTorch, JAX, etc.)
- Modeling libraries (Transformers, Diffusers, etc.)
- Utility libraries (PEFT, TorchAO, etc.)
- Data-related libraries (Torchvision, Datasets)
Briefing about how testing and CI are approached for the modeling and utility libraries at HF
Best practices from the wild
- Python version coverage – are we covering all Python versions, or is there a minimum requirement?
- Operating system distribution coverage – are we only targeting Linux?
- Should code coverage be approached in the same way it’s approached for regular software?
- Benchmarking tests – Do model forward passes take the same amount of time in a new feature? If there’s an increase, can we explain it?
- Conditional accelerator tests (for certain changes trigger GPU tests, for example)
- Approaching regression tests – with each new version of the library, outputs shouldn’t change without a plausible justification
- Dealing with known failures – should we test known failures?

By the end of this talk, the audience will have a good understanding of the effective approaches to support modern ML libraries.

Benjamin Bossan

Machine Learning Engineer at Hugging Face

Sayak Paul

Sayak works on diffusion models at Hugging Face. His day-to-day includes training and babysitting diffusion models for images and videos, working on the diffusers library, and collaborating on applied research ideas. Off the work, he likes to binge-watch Suits and ICML tutorials.

Designing tests for ML libraries – lessons from the wild .ical 2025-09-25 13:40–14:30, Voyager

Designing tests for ML libraries – lessons from the wild
.ical
2025-09-25 13:40–14:30, Voyager