Mingxuan Zhao
Ming Zhao is an open-source developer and Developer Advocate at IBM Research, where he helps IBM leverage open technologies while building impactful tools and growing vibrant open-source communities. He’s passionate about making open tech accessible to all and ensuring developers have the tools they need to succeed in the rapidly developing AI space. Ming now leads community efforts around Docling, IBM’s fastest-growing open source project, recently welcomed into the LF AI & Data Foundation.
Session
If you have worked with AI in any capacity, you'll know that AI is only as valuable as the data in can leverage. Data is the cornerstone of AI, and developers need better ways to transform complex documents into structured data ready for model training and inference.
In this session we will learn how to turn common, real-world documents and scans into structured data for search and RAG. In this 90-minute, code-along workshop, you’ll learn all about Docling, an open-source toolkit for advanced document conversion, allowing you to leverage your data more effectively into AI workflows. We’ll complete three labs; Conversion, Chunking, and RAG, and you’ll leave with runnable notebooks from a public GitHub repo.
Audience: Python practitioners shipping document-centric apps.
Prereqs: basic Python/Jupyter.