2025-12-10 –, Machine Learning & AI
Ever wonder why structured LLM output doesn’t feel as reliable as its natural language responses? At Khan Academy, we asked ourselves the same thing—especially as we leaned heavily on JSON-based structured outputs to power our AI tutor, Khanmigo.
Surprisingly, the root of the problem often lies in one of the most familiar tools in a Python developer’s toolbox: the humble dict
. In this talk, we follow the story of how dictionary ordering can shape (and sometimes distort) structured LLM output. We’ll walk through how different frameworks—OpenAI, Claude, LangChain, OpenRouter, vLLM—handle structured responses, and why those differences matter more than you’d expect.
Along the way, we’ll share practical best practices we’ve developed to improve structured output reliability, observe subtle failure cases, and debug weird edge behaviors. If you’re building LLM apps with structured output, you’ll leave with concrete tips—and a deeper appreciation for the details that make or break your system.
Structured output (like JSON) is increasingly used in LLM applications to enforce a predictable schema and simplify downstream parsing. However, developers often assume that structured output is deterministic and robust—until they run into subtle bugs. At Khan Academy, we’ve run Khanmigo on structured JSON output since before it was even a supported feature. Along the way, we’ve learned a lot about where things can go wrong.
Our investigation began when we noticed inconsistent output quality across different LLM frameworks, even with identical prompts and models. The culprit? Python dictionary ordering and how different frameworks serialize JSON schemas.
We'll explore:
- How Python's evolution from unordered (pre-3.7) to insertion-ordered dictionaries affects LLM frameworks, or how it lingers through other frameworks in (post-3.7)
- Framework-specific serialization behaviors in OpenAI SDK, Anthropic SDK, LangChain, OpenRouter, and vLLM
- Measurable impact on output quality through A/B testing results
Attendees should have basic familiarity with Python and JSON, but no deep LLM expertise is required. We'll explain technical concepts clearly while providing actionable insights for immediate application.
No