06-08, 11:00–11:45 (Europe/London), Grand Hall
Reproducibility in embedding benchmarks is no small feat. Prompt variability, growing computational demands, and evolving tasks make fair comparisons a challenge. The need for robust benchmarking has never been greater. In this talk, we’ll explore the quirks and complexities of benchmarking embedding models, such as prompt sensitivity, scaling issues, and emergent behaviors.
We’ll hear straight from the Massive Text Embedding Benchmark (MTEB) maintainers and show how MTEB (and its extensions like MMTEB and MIEB) simplifies reproducibility, making it easier for researchers and industry practitioners to measure progress, choose the right models, and push the boundaries of embedding performance.
Reproducibility in embedding benchmarks is no small feat. Prompt variability, growing computational demands, and evolving tasks make fair comparisons a challenge. The need for robust benchmarking has never been greater.
The Massive Text Embedding Benchmark (MTEB) addresses these challenges with a standardized, open-source framework for evaluating text embedding models. Covering diverse tasks like clustering, retrieval, and classification, MTEB ensures consistent and reproducible results. Extensions like MMTEB (multilingual) and MIEB (image) further expand its capabilities.
In this talk, we’ll explore the quirks and complexities of benchmarking embedding models, such as prompt sensitivity, scaling issues, and emergent behaviors. We’ll show how MTEB simplifies reproducibility, making it easier for researchers and industry practitioners to measure progress, choose the right models, and push the boundaries of embedding performance.
Previous knowledge expected
My focus is on making AI systems usable, scalable, and maintainable. I'm currently a Staff Data Scientist at Zendesk QA, working on LLM-powered features that see millions of conversations a day.
Previously at Clarifai, I helped build and maintain multimodal retrieval systems in production. My background is in Aerospace Engineering and Machine Learning and I hold undergraduate (B.A.Sc in EngSci) and graduate (M.A.Sc) degrees from the University of Toronto.
In my spare time, I am a maintainer for MTEB, I like to see the world, and do a bit of swim/bike/run racing.