PyData Seattle 2025

From Notebook to Cloud at Lightspeed: Accelerating ML Development with Ray
2025-11-09 , Tutorial Track 2

Fast iteration is the backbone of machine learning innovation. I’ve been exploring how to enable ML engineers to prototype and scale training workloads with minimal friction and maximal flexibility - all without leaving the comfort of Python. This talk demonstrates how Ray can be used as a powerful framework for accelerating ML development workflows through standalone persistent ray clusters as well as ephemeral ray clusters per job.


This talk demonstrates how Ray can be used as a powerful framework for accelerating ML development workflows. I’ll walk through how we:
1. Schedule and run remote training functions with fine-grained control over resources
2. Dynamically install Python dependencies at runtime
3. Inject user code directly into running environments
4. Coordinate distributed training workloads with minimal boilerplate

We’ll also explore patterns for dynamically provisioning compute - such as maintaining a hot standalone Ray cluster or spawning Ray clusters on demand - to support ephemeral, isolated training environments.

The talk showcases how one can start designing a user-centric ML platform experience using Ray, bridging the gap between local notebooks and scalable, cloud-native training infrastructure.


Prior Knowledge Expected:

Previous knowledge expected

Anindya is a Machine Learning Platform Engineer at Zoox, building scalable infrastructure for distributed training of LLMs and VLMs. Previously at Lyft, he led the development of Spark Notebooks on Kubernetes to accelerate ML prototyping. He has worked across LLMOps, MLOps, and data infrastructure, and has built systems for training, serving, and monitoring ML models at scale using Kubernetes, Spark, and modern ML tooling.

This speaker also appears in: