Let Me Structure Freely? How to Improve LLM Structured Output Quality PyData Global 2025

Let Me Structure Freely? How to Improve LLM Structured Output Quality
.ical
2025-12-10 17:00–17:30, Machine Learning & AI

Ever wonder why structured LLM output doesn’t feel as reliable as its natural language responses? At Khan Academy, we asked ourselves the same thing—especially as we leaned heavily on JSON-based structured outputs to power our AI tutor, Khanmigo.

Surprisingly, the root of the problem often lies in one of the most familiar tools in a Python developer’s toolbox: the humble dict. In this talk, we follow the story of how dictionary ordering can shape (and sometimes distort) structured LLM output. We’ll walk through how different frameworks—OpenAI, Claude, LangChain, OpenRouter, vLLM—handle structured responses, and why those differences matter more than you’d expect.

Along the way, we’ll share practical best practices we’ve developed to improve structured output reliability, observe subtle failure cases, and debug weird edge behaviors. If you’re building LLM apps with structured output, you’ll leave with concrete tips—and a deeper appreciation for the details that make or break your system.

Structured output (like JSON) is increasingly used in LLM applications to enforce a predictable schema and simplify downstream parsing. However, developers often assume that structured output is deterministic and robust—until they run into subtle bugs. At Khan Academy, we’ve run Khanmigo on structured JSON output since before it was even a supported feature. Along the way, we’ve learned a lot about where things can go wrong.

Our investigation began when we noticed inconsistent output quality across different LLM frameworks, even with identical prompts and models. The culprit? Python dictionary ordering and how different frameworks serialize JSON schemas.

We'll explore:

How Python's evolution from unordered (pre-3.7) to insertion-ordered dictionaries affects LLM frameworks, or how it lingers through other frameworks in (post-3.7)
Framework-specific serialization behaviors in OpenAI SDK, Anthropic SDK, LangChain, OpenRouter, and vLLM
Measurable impact on output quality through A/B testing results

Attendees should have basic familiarity with Python and JSON, but no deep LLM expertise is required. We'll explain technical concepts clearly while providing actionable insights for immediate application.

Prior Knowledge Expected: No

Boris Lau

Boris Lau currently serves as a Staff Software Engineer specializing in MLOps and Site Reliability Engineering (SRE) at Khan Academy. His expertise in machine learning infrastructure and observability is critical for ensuring the performance and reliability of AI-driven products, such as Khanmigo.

He lives in Vancouver, Canada, and serves as an organizer for the local Vancouver PyData chapter.

Let Me Structure Freely? How to Improve LLM Structured Output Quality .ical 2025-12-10 17:00–17:30, Machine Learning & AI

Let Me Structure Freely? How to Improve LLM Structured Output Quality
.ical
2025-12-10 17:00–17:30, Machine Learning & AI