Raphael Mitsch
I'm a machine learning engineer, these days mostly working in natural language processing. I have soft spots for data visualization, interpretable ML, and the decentralization of AI.
I build NLP systems for a living. Occasionally they work. I don't believe that attention is all we need. What I do believe in: pragmatism, taking a deep breath before falling for the next hype, and touching grass.
Session
Generative models are dominating the spotlight lately - and rightly so. Their flexibility and zero-shot capabilities make it incredibly fast to prototype NLP applications. However, one-shotting complex NLP problems often isn't the best long-term strategy. Decomposing problems into modular, pipelined tasks leads to better debuggability, greater interpretability, and more reliable performance.
This modular pipeline approach pairs naturally with zero- and few-shot (ZFS) models, enabling rapid yet robust prototyping without requiring large datasets or fine-tuning. Crucially, many real-world applications need structured data outputs—not free-form text. Generative models often struggle to consistently produce structured results, which is why enforcing structured outputs is now a core feature across contemporary NLP tools (like Outlines, DSPy, LangChain, Ollama, vLLM, and others).
For engineers building NLP pipelines today, the landscape is fragmented. There’s no single standard for structured generation yet, and switching between tools can be costly and frustrating. The NLP tooling landscape lacks a flexible, model-agnostic solution that minimizes setup overhead, supports structured outputs, and accelerates iteration.
Introducing Sieves: a modular toolkit for building robust NLP document processing pipelines using ZFS models.