PyData Seattle 2025

Aziza Mirsaidova

Aziza is an Applied Scientist at Oracle (OCI AI) with more than 4 years of work experience with AI/ML/NLP technologies. Previously she worked in LLM evaluation and content moderation in AI safety at Microsoft’s Responsible & OpenAI research team. She is a graduate of a master of science in Artificial Intelligence from Northwestern University. Throughout her time at Northwestern, she worked as a ML Research Associate at Technological for Inclusive Learning and Teaching Lab (tiilt) in building multimodal conversation analysis applications called Blinc. She was a Data Science for Social Good Fellow at University of Washington’s eScience Institute during the summer of 2022. Aziza is interested in developing tools and methods that embed human-like reasoning capabilities into AI systems (particularly generative AI) and applying these technologies to socially-driven tasks that enhance human well-being. Once she is done coding, she is either training for her next marathon race or hiking somewhere around PNW.


Session

11-08
15:20
45min
Are Your Fine-Tuned Models Reliable? Evaluating Prompt Robustness and Alignment
Aziza Mirsaidova

Fine-tuning improves what an LLM knows, but it does little to guarantee how the model behaves under real-world prompt variation. Small changes in format, phrasing, or ordering can cause accuracy to collapse, exposing brittle decision boundaries. This talk presents practical methods for evaluating and improving robustness, including FORMATSPREAD for estimating performance spread, DivSampling for generating diverse stress tests, mixture-of-formats for structured variation, and alignment-aware techniques such as adversarial contrast sets and multilingual perturbations. We also show how post-training optimization with Direct Preference Optimization (DPO) can integrate robustness feedback into the alignment loop.

Talk Track 1