Anindya Saha PyData Seattle 2025

Anindya Saha
.ical

Anindya is a Machine Learning Platform Engineer at Zoox, building scalable infrastructure for distributed training of LLMs and VLMs. Previously at Lyft, he led the development of Spark Notebooks on Kubernetes to accelerate ML prototyping. He has worked across LLMOps, MLOps, and data infrastructure, and has built systems for training, serving, and monitoring ML models at scale using Kubernetes, Spark, and modern ML tooling.

Sessions

11-08

10:55

45min

Scaling Image Captioning Workflows with Ray Data, Ray Data LLM and vLLM

Anindya Saha

Processing large-scale image datasets for captioning presents coordination challenges that often lead to complex, difficult-to-maintain systems. I've been exploring how Ray Data can simplify these workflows while improving throughput and reliability.This talk demonstrates how to build image captioning pipelines combining Ray Data's batch processing capabilities, Ray Data LLM's batch inference capabilities, vLLM for efficient model serving.

From Notebook to Cloud at Lightspeed: Accelerating ML Development with Ray

Anindya Saha

Fast iteration is the backbone of machine learning innovation. I’ve been exploring how to enable ML engineers to prototype and scale training workloads with minimal friction and maximal flexibility - all without leaving the comfort of Python. This talk demonstrates how Ray can be used as a powerful framework for accelerating ML development workflows through standalone persistent ray clusters as well as ephemeral ray clusters per job.

Tutorial Track 2

Anindya Saha .ical

Sessions

Anindya Saha
.ical