2025-11-08 –, Talk Track 1
Fine-tuning improves what an LLM knows, but it does little to guarantee how the model behaves under real-world prompt variation. Small changes in format, phrasing, or ordering can cause accuracy to collapse, exposing brittle decision boundaries. This talk presents practical methods for evaluating and improving robustness, including FORMATSPREAD for estimating performance spread, DivSampling for generating diverse stress tests, mixture-of-formats for structured variation, and alignment-aware techniques such as adversarial contrast sets and multilingual perturbations. We also show how post-training optimization with Direct Preference Optimization (DPO) can integrate robustness feedback into the alignment loop.
Fine-tuned LLMs often deliver strong benchmark results, yet their reliability in practice remains fragile. A common but underexplored issue is sensitivity to prompt variation: changing the output format from plain text to JSON, altering the order of few-shot examples, or substituting synonyms can reduce performance by 10–20 accuracy points. These failures are not only output level but they expose a deeper instability in how models represent tasks. Without systematic robustness evaluation and alignment-aware optimization, deploying such models can create serious risks.
This talk begins by examining the sources of fragility in fine-tuned LLMs. We demonstrate how prompt formatting, phrasing, and ordering interact with latent embedding separability, leading to non-monotonic decision boundaries. We then move into methods for quantifying and mitigating these issues. FORMATSPREAD provides a computationally efficient estimate of performance spread across a large prompt space. DivSampling introduces stochastic diversity to generate stress tests that mimic real-world prompt variation. Mixture-of-formats combines structural diversity and task rephrasings to probe whether models generalize beyond the style of their training data. Beyond these, we will introduce alignment-aware evaluation frameworks, including adversarial contrast sets, counterfactual robustness testing, and multilingual perturbations.
The session then connects evaluation with post-training optimization. We discuss how Direct Preference Optimization (DPO) can refine prompt robustness by aligning models with preference data collected from diverse prompt variants. Compared to RLHF, DPO is more stable and sample-efficient, making it practical for robustness tuning. We also explore hybrid workflows where robustness evaluation feeds into targeted preference optimization, creating a feedback loop between testing and alignment.
The talk includes further understanding of evaluation systems and computational techniques for testing prompt format shifts, lexical variation, and adversarial rephrasing, along with interpretable metrics that quantify robustness gaps. By the end, attendees will have both a conceptual map and concrete code of diagnosing brittleness, deciding when to refine prompts versus when to re-tune with preference optimization, and integrating robustness checks into their ML lifecycle. The key takeaway is that fine-tuning adapts what a model knows, but without robustness evaluation and post-training optimization, practitioners cannot trust how it behaves under realistic conditions. This talk provides a technical but accessible roadmap for building LLMs that are not only accurate, but reliable, aligned, and production-ready.
Previous knowledge expected
Aziza is an Applied Scientist at Oracle (OCI AI) with more than 4 years of work experience with AI/ML/NLP technologies. Previously she worked in LLM evaluation and content moderation in AI safety at Microsoft’s Responsible & OpenAI research team. She is a graduate of a master of science in Artificial Intelligence from Northwestern University. Throughout her time at Northwestern, she worked as a ML Research Associate at Technological for Inclusive Learning and Teaching Lab (tiilt) in building multimodal conversation analysis applications called Blinc. She was a Data Science for Social Good Fellow at University of Washington’s eScience Institute during the summer of 2022. Aziza is interested in developing tools and methods that embed human-like reasoning capabilities into AI systems (particularly generative AI) and applying these technologies to socially-driven tasks that enhance human well-being. Once she is done coding, she is either training for her next marathon race or hiking somewhere around PNW.