PyData Seattle 2025

Real-time ML: Accelerating Python for (< 5ms) inference at scale
2025-11-07 , Talk Track 3

Real-time machine learning depends on features and data that by definition can’t be pre-computed. Detecting fraud, accurate chat bots, and product recommendations at scale requires both processing events that emerge seconds ago and building context from a multitude of data sources. How do we build an infrastructure platform that executes complex data pipelines (< 5ms) end-to-end and on-demand?

All while meeting data teams where they are–in Python–the language of ML. We’ll share how we built a Symbolic Python Interpreter that accelerates ML pipelines by transpiling Python into DAGs of static expressions. These expressions are optimized and run at scale with Velox–an OSS (~4k stars) unified query engine (C++) from Meta.


We'll cover what it takes to do dynamic real-time machine learning at scale! The secret sauce is Velox! Started at Meta and explicitly built for high throughput (large offline) workloads, Velox, serves as a modern unified execution engine. We'll go over a number of optimizations and ways that we've extended Velox to also handle low-latency (high frequency) online workloads so that data teams (ML Engineers, Data Engineers, and Data Scientists) can build, iterate, and deploy models from a single platform--quickly and reliably!


Prior Knowledge Expected:

No previous knowledge expected

Elliot Marx, Chalk co-founder, started his career at Affirm, where he built the early risk and credit data infrastructure system (the inspiration for Chalk). He then co-founded Haven Money, which Credit Karma acquired to power its banking products. He holds a B.S. and M.S. in Computer Science from Stanford University.