Linoy Cohen
Linoy Cohen is a Senior Data Scientist at Intuit in the NLP team. As part of her role, she leads the evaluation track and is responsible for creating automatic evaluations for LLMs and Agents that provide an objective method to measure their capabilities based on specific custom criteria and needs.
Session
AI agents are becoming the next big thing. But deploying an agent without truly understanding its performance, limits, and potential failure points is a high-stakes gamble. How do you ensure your agent is not just functional, but genuinely reliable, robust, and safe?
This talk explores the practical challenges of evaluating AI agents effectively. We'll discover how to define meaningful success metrics, implement comprehensive testing strategies that reflect real world complexity, and meaningfully incorporate human feedback. You'll leave with a practical framework to confidently assess your agent's capabilities and ensure reliable performance when stakes are high.