PyData Berlin 2025

Data science in containers: the good, the bad, and the ugly
2025-09-02 , B05-B06

If we want to run data science workloads (e.g. using Tensorflow, PyTorch, and others) in containers (for local development or production on Kubernetes), we need to build container images. Doing that with a Dockerfile is fairly straightforward, but is it the best method?
In this talk, we'll take a well-known speech-to-text model (Whisper) and show various ways to run it in containers, comparing the outcomes in terms of image size and build time.


We'll demonstrate how to switch versions DRY-style (without maintaining multiple Dockerfiles!), how to leverage newer techniques like BuildKit cache mounts, and discuss other important considerations like the use of Alpine with Python, progressive image loading, and model loading strategies.

Attendees will learn practical containerization techniques specifically tailored for data science workflows, with concrete examples using the Whisper model as our case study.


Expected audience expertise: Domain:

Novice

Prerequisites:

The talk is suitable for anyone who has already tried (or will soon need) to build or optimize container images for data science workload. Beginners will get a sense of what techniques exist out there, and intermediate users will get some actionable tips to optimize their existing Dockerfiles or Containerfiles.

Abstract as a tweet (X) or toot (Mastodon):

Learn how to optimize Docker containers for data science workloads! We'll containerize a speech-to-text model using different approaches, compare image sizes & build times, and explore modern techniques like BuildKit cache mounts.

Jérôme was part of the team that built, scaled, and operated the dotCloud PAAS, before that company became Docker. He's now an independent consultant, and since he loves to share what he learned, he continues to give many talks and demos on containers, Docker, and Kubernetes. He values diversity, and strives to be a good ally, or at least a decent social justice sidekick. He also collects musical instruments and can arguably play the theme of Zelda on a dozen of them.