Sebastian Duerr
Seb is a Lead AI Engineer with a Master’s in Information Systems, originally from Germany and now US‑based. After beginning a PhD, he moved into consulting and served as Chief Product Officer at a major Austrian bank. He later pursued NLP research with MIT, co‑founded and exited a startup, and built AI/NLP systems in production. He has taught 20+ academic courses and published seven peer‑reviewed articles, known for translating complex concepts into practical solutions that bridge technical rigor with stakeholder needs.
Session
LLM apps fail without reliable, reproducible evaluation. This talk maps the open‑source evaluation landscape, compares leading techniques (RAGAS, G-Eval, graders) and frameworks (DeepEval, Phoenix, LangFuse, OpenAI Evals), and shows how to combine unit tests, RAG‑specific evals, and observability to ship higher‑quality systems.
Attendees leave with a decision checklist, code patterns, and a production‑ready playbook.