PyData Seattle 2025

Aziza Mirsaidova

Aziza is an Applied Scientist at Oracle (AI Science) in Generative AI Evaluations specifically working with multi-modal, text and code generation. Previously she worked in Content Moderation, AI safety at Microsoft’s Responsible & OpenAI research team. She is a graduate of a master of science in Artificial Intelligence from Northwestern University. Aziza is interested in developing tools and methods that embed human-like reasoning capabilities into AI systems and applying these technologies to socially-driven tasks. Aziza is based in Seattle and after work, she gets busy training for her next marathon or hiking somewhere around PNW.


Session

11-08
15:20
45min
Prompt Variation as a Diagnostic Tool: Exposing Contamination, Memorization, and True Capability in LLMs
Aziza Mirsaidova

Prompt variation isn't just an engineering nuisance, it's a window into fundamental LLM limitations. When a model's accuracy drops from 95% to 75% due to minor rephrasing, we're not just seeing brittleness; we're potentially exposing data contamination, spurious correlations, and shallow pattern matching. This talk explores prompt variation as a powerful diagnostic tool for understanding LLM reliability. We discuss how small changes in format, phrasing, or ordering can cause accuracy to collapse revealing about models memorizing benchmark patterns or learning superficial correlations rather than robust task representations. Drawing from academic and industry research, you will learn to distinguish between LLM's true capability and memorization, identify when models are pattern-matching rather than reasoning, and build evaluation frameworks that expose these vulnerabilities before deployment.

Room 301B