Linoy Cohen PyData Tel Aviv 2025

Linoy Cohen
.ical

Linoy Cohen is a Senior Data Scientist at Intuit in the NLP team. As part of her role, she leads the evaluation track and is responsible for creating automatic evaluations for LLMs and Agents that provide an objective method to measure their capabilities based on specific custom criteria and needs.

Session

11-05

15:15

30min

Evaluating Your AI Agent: How Do You Properly Measure Performance? (HE)

Linoy Cohen, Shirli Di Castro Shashua

AI agents are becoming the next big thing. But deploying an agent without truly understanding its performance, limits, and potential failure points is a high-stakes gamble. How do you ensure your agent is not just functional, but genuinely reliable, robust, and safe?
This talk explores the practical challenges of evaluating AI agents effectively. We'll discover how to define meaningful success metrics, implement comprehensive testing strategies that reflect real world complexity, and meaningfully incorporate human feedback. You'll leave with a practical framework to confidently assess your agent's capabilities and ensure reliable performance when stakes are high.

Blue

Linoy Cohen .ical

Session

Linoy Cohen
.ical