PyData Amsterdam 2025

Large-Scale Video Intelligence
09-25, 10:35–11:10 (Europe/Amsterdam), Voyager

The explosion of video data demands search beyond simple metadata. How do we find specific visual moments, actions, or faces within petabytes of footage? This talk dives into architecting a robust, scalable multi-modal video search system.
We will explore an architecture combining efficient batch preprocessing for feature extraction (including person detection, face/CLIP-style embeddings) with optimized vector database indexing. Attendees will learn practical strategies for managing massive datasets, optimizing ML inference (e.g., lightweight models, specialized runtimes), and bridging pre-computed indexes with real-time analysis for deeper insights. This session is for data scientists, ML engineers, and architects looking to build sophisticated video understanding capabilities.

Audience: Data Scientists, Machine Learning Engineers, Data Engineers, System Architects.

Takeaway: Attendees will learn architectural patterns and practical techniques for building scalable multi-modal video search systems, including feature extraction, vector database utilization, and ML pipeline optimization.

Background Knowledge: Familiarity with Python, core machine learning concepts (e.g., embeddings, classification), and general data processing pipelines is beneficial. Experience with video processing or computer vision is a plus but not strictly required.


Searching vast video archives (petabytes) requires deep visual and temporal understanding, far beyond metadata. This talk details our journey building a system for large-scale, multi-modal video retrieval, empowering complex analytical queries. We will focus on generalizable techniques, particularly emphasizing efficient and adaptable model usage to overcome the inherent challenges of video data.