2025-09-25 –, Voyager
In this talk, we will cover how to write effective test cases for machine learning (ML) libraries that are used by hundreds of thousands of users on a regular basis. Tests, despite their well-established need for trust and foolproofing, often get less prioritized. Later, this can wreak havoc on massive codebases, with a high likelihood of introducing breaking changes and other unpleasant situations. This talk deals with our approach to testing our ML libraries, which serve a wide user base. We will cover a wide variety of topics, including the mindset and the necessity of minimal-yet-sufficient testing, all the way up to sharing some practical examples of end-to-end test suites.
Why revisit an established topic?
- How do ML libraries differ from regular Python libraries and how does it impact their testing?
Types of ML Libraries
- Platform-level libraries (PyTorch, JAX, etc.)
- Modeling libraries (Transformers, Diffusers, etc.)
- Utility libraries (PEFT, TorchAO, etc.)
- Data-related libraries (Torchvision, Datasets)
Briefing about how testing and CI are approached for the modeling and utility libraries at HF
Best practices from the wild
- Python version coverage – are we covering all Python versions, or is there a minimum requirement?
- Operating system distribution coverage – are we only targeting Linux?
- Should code coverage be approached in the same way it’s approached for regular software?
- Benchmarking tests – Do model forward passes take the same amount of time in a new feature? If there’s an increase, can we explain it?
- Conditional accelerator tests (for certain changes trigger GPU tests, for example)
- Approaching regression tests – with each new version of the library, outputs shouldn’t change without a plausible justification
- Dealing with known failures – should we test known failures?
By the end of this talk, the audience will have a good understanding of the effective approaches to support modern ML libraries.
Machine Learning Engineer at Hugging Face
Sayak works on diffusion models at Hugging Face. His day-to-day includes training and babysitting diffusion models for images and videos, working on the diffusers library, and collaborating on applied research ideas. Off the work, he likes to binge-watch Suits and ICML tutorials.