Oleh Kostromin PyData Berlin 2025

Oleh Kostromin
.ical

I am a Data Scientist primarily focused on Deep Learning and MLOps. In my spare time I contribute to several open-source python libraries.

Session

09-02

14:20

30min

Beyond Benchmarks: Practical Evaluation Strategies for Compound AI Systems

Iryna Kondrashchenko, Oleh Kostromin

Evaluating large language models (LLMs) in real-world applications goes far beyond standard benchmarks. When LLMs are embedded in complex pipelines, choosing the right models, prompts, and parameters becomes an ongoing challenge.

In this talk, we will present a practical, human-in-the-loop evaluation framework that enables systematic improvement of LLM-powered systems based on expert feedback. By combining domain expert insights and automated evaluation methods, it is possible to iteratively refine these systems while building transparency and trust.

This talk will be valuable for anyone who wants to ensure their LLM applications can handle real-world complexity - not just perform well on generic benchmarks.

Natural Language Processing & Audio (incl. Generative AI NLP)

B05-B06

Oleh Kostromin .ical

Session

Oleh Kostromin
.ical