Shirli Di Castro Shashua
Shirli is a senior AI scientist at Intuit, where she brings cutting-edge innovation to life through generative models and agentic AI. Her areas of expertise span reinforcement learning, LLM training and evaluation, NLP, classical machine learning, and the design of intelligent agents.
Shirli holds a Ph.D. and M.Sc. in Electrical and Computer Engineering from the Technion, specializing in Reinforcement Learning, and a B.Sc. in Biomedical Engineering from Ben Gurion University.
Session
AI agents are becoming the next big thing. But deploying an agent without truly understanding its performance, limits, and potential failure points is a high-stakes gamble. How do you ensure your agent is not just functional, but genuinely reliable, robust, and safe?
This talk explores the practical challenges of evaluating AI agents effectively. We'll discover how to define meaningful success metrics, implement comprehensive testing strategies that reflect real world complexity, and meaningfully incorporate human feedback. You'll leave with a practical framework to confidently assess your agent's capabilities and ensure reliable performance when stakes are high.