Resource Monitoring and Optimization with Metaflow PyData Amsterdam 2025

Resource Monitoring and Optimization with Metaflow
.ical

09-26, 14:55–15:30 (Europe/Amsterdam), Nebula

Metaflow is a powerful workflow management framework for data science, but optimizing its cloud resource usage still involves guesswork. We have extended Metaflow with a lightweight resource tracking tool that automatically monitors CPU, memory, GPU, and more, then recommends the most cost-effective cloud instance type for future runs. A single line of code can save you from overprovisioned costs or painful job failures!

Metaflow empowers data scientists with reproducible workflows, versioned artifacts, and scalable executions on AWS Batch or Kubernetes. While data scientists can request CPU, memory, etc. with the @resources Python decorator, Metaflow doesn't track what's actually tracked at runtime -- leading to overprovisioning, wasted money, or job that crash due to underestimated memory/GPU constraints.

We built the @track_resources decorator to address this, which automatically profiles CPU, memory, GPU, VRAM, traffic, storage, and I/O usage at both the process-level and system-wide for each workflow step, whether running locally or in the cloud.

With zero-effort and zero dependencies, this tool:
- Auto-generates Metaflow Cards with temporal resource charts, per-instance-type cost projections, and cloud server recommendations.
- Provides actionable insights for optimizing resource allocation.
- Requires just a single line of code to integrate.

Check it out on GitHub: https://github.com/SpareCores/resource-tracker

Resource Monitoring and Optimization with Metaflow .ical 09-26, 14:55–15:30 (Europe/Amsterdam), Nebula

Resource Monitoring and Optimization with Metaflow
.ical

09-26, 14:55–15:30 (Europe/Amsterdam), Nebula