PyData Amsterdam 2025

Large-Scale Video Intelligence
2025-09-25 , Voyager

The explosion of video data demands search beyond simple metadata. How do we find specific visual moments, actions, or faces within petabytes of footage? This talk dives into architecting a robust, scalable multi-modal video search system.
We will explore an architecture combining efficient batch preprocessing for feature extraction (including person detection, face/CLIP-style embeddings) with optimized vector database indexing. Attendees will learn practical strategies for managing massive datasets, optimizing ML inference (e.g., lightweight models, specialized runtimes), and bridging pre-computed indexes with real-time analysis for deeper insights. This session is for data scientists, ML engineers, and architects looking to build sophisticated video understanding capabilities.

Audience: Data Scientists, Machine Learning Engineers, Data Engineers, System Architects.

Takeaway: Attendees will learn architectural patterns and practical techniques for building scalable multi-modal video search systems, including feature extraction, vector database utilization, and ML pipeline optimization.

Background Knowledge: Familiarity with Python, core machine learning concepts (e.g., embeddings, classification), and general data processing pipelines is beneficial. Experience with video processing or computer vision is a plus but not strictly required.


Searching vast video archives (petabytes) requires deep visual and temporal understanding, far beyond metadata. This talk details our journey building a system for large-scale, multi-modal video retrieval, empowering complex analytical queries. We will focus on generalizable techniques, particularly emphasizing efficient and adaptable model usage to overcome the inherent challenges of video data.

Antonino Ingargiola is currently Lead AI Architect at Agile Lab where he oversees AI initiatives and projects in large enterprises. Previously was co-founder and CTO @ smartFAB a startup offering an advanced analytics solution for the manufacturing industry. In the past he worked as associated scientist at UCLA (California, USA) combining Machine Learning and biophysics. Antonino holds a Ph.D. in Information Technology and MD in Electronics Engineering both from Politecnico di Milano (Italy).

Irene Donato is a Data Scientist at Agile Lab with a PhD in Mathematics and a background in Physics. She specializes in AI strategy. With experience across academia and industry, Irene focuses on applying data science to solve complex business problems.