PyData Global 2025

Garbage In, Lawsuit Out: Building Compliant and Reproducible ML Pipelines
2025-12-11 , General Track

Your model might pass all the benchmarks—but can it survive a subpoena? In the race to ship AI, most teams are building workflows that look great in dashboards but fall apart under legal, regulatory, or ethical pressure. Because the real liability doesn’t live in your model weights—it’s buried in your data.


This session is a reality check for anyone shipping machine learning in production. We’ll walk through the dark corners of modern ML pipelines: mutable datasets with no history, mystery data sources with missing labels, and a forgotten column of PII that’s just been shipped to production. Then we’ll show how to fix it—without turning your data team into compliance officers.

You’ll learn how to embed reproducibility, traceability, and policy enforcement into your pipeline without slowing it to a crawl: track every dataset change, version every experiment, validate against policy gates, and generate audit trails that actually mean something. Whether you’re dealing with GDPR, HIPAA, or just not wanting to get roasted by internal audit, this talk gives you the blueprint for ML you can defend in court—and still ship on time.


Prior Knowledge Expected: Yes

Itai is a seasoned software engineer, passionate about clean code and design, and about simplifying what is complex. Doing what’s needed, whether it’s backend, full-stack, or mobile development, and enjoys creating well-crafted products.