Efficient Time-Series Forecasting with Thousands of Local Models on Databricks PyData Eindhoven 2025

Efficient Time-Series Forecasting with Thousands of Local Models on Databricks
.ical
2025-12-09 11:20–11:50, Auditorium

In industries like energy and retail, forecasting often requires local models when each time series has unique behavior — though training thousands of them can be overwhelming. However, training and managing thousands of such models presents scalability and operational challenges. This talk shows how we scaled local models on Databricks by leveraging the Pandas API on Spark, and shares practical lessons on storage, reuse, and scaling challenges to make this approach efficient when it’s truly needed

Industries like energy, retail, and logistics often face a critical problem: forecasting demand or consumption for thousands of entities, each with unique patterns. Global models may miss local nuances, while training individual models can overwhelm traditional pipelines.
In this talk, we’ll explore a practical and scalable solution built on Databricks—using pandas API on spark to train and predict thousands of local models in parallel.
You’ll discover how to:
- Train and serialize ML models per group efficiently.
- Use binary model storage to overcome MLflow’s RPS limits.
- Generate and forecast future data asynchronously at scale.
We’ll discuss the trade-offs between MLflow Registry, Unity Catalog, and inline model execution, and how this approach powers forecasting for thousands of groups in the energy sector.
Whether you’re a data scientist or ML engineer, you’ll leave with actionable ideas for scaling your own forecasting workflows on Databricks

Prior Knowledge Expected: Beginner - No prior knowledge needed

Daria Mustafina

Efficient Time-Series Forecasting with Thousands of Local Models on Databricks .ical 2025-12-09 11:20–11:50, Auditorium

Efficient Time-Series Forecasting with Thousands of Local Models on Databricks
.ical
2025-12-09 11:20–11:50, Auditorium