PyData Seattle 2025

Why Models Break Your Pipelines (and How to Make Them First-Class Citizens)
2025-11-08 , Room 301A

Most AI pipelines still treat models like Python UDFs, just another function bolted onto Spark, Pandas, or Ray. But models aren’t functions: they’re expensive, stateful, and difficult to configure. In this talk, we’ll explore why this mental model breaks at scale and share practical patterns for treating models as first-class citizens in your pipelines.


When data scientists move from prototyping to production, they discover that plugging a model into their pipeline is nothing like calling a UDF.

Models bring unique challenges:
• Expensive (GPU hungry, rate-limited APIs).
• Stateful (versions, prompts, seeds).
• Unreliable (timeouts, OOM crashes).

Using real examples from multimodal pipelines, we’ll show why treating models as UDFs leads to brittle jobs, and introduce first-class patterns that solve these problems:
• One-line model loading in dataframes.
• Smart batching & caching without hand-tuning.
• Model-aware scheduling & retries.
• Row-level fault tolerance (99% completion instead of all-or-nothing failures).


Prior Knowledge Expected: Previous knowledge expected

Everett Kleven is a Solutions Engineer and Public Speaker at Daft, an open-source distributed query engine providing simple and reliable data processing for any modality and scale. Previously a Big Data TPM at Lucid Motors and Flight Controls Engineer at Boeing, he stewards community engagement curating technical content and demos on the latest advancements in multimodal AI. Everett holds advanced degrees in Aerospace Engineering, Mechanical Engineering, and Applied Physics from Washington University in St. Louis and Whitworth University in Spokane WA.