PyData Amsterdam 2025

Meet Docling: The “Pandas” for document AI
2025-09-24 , Margaret Hamilton @ TNW City

A workshop session to show you the basics on how to use Docling to enhance document ingestion in your AI workflow.


With the rapid rise of AI, developers need better ways to transform complex documents into structured data ready for model training and inference. Enter Docling, an open source Python package that's quickly becoming the go-to for document parsing and export. In just a few months, Docling has earned over 25,000 GitHub stars and is already reshaping how developers approach document AI.

In this session, you'll get an in-depth introduction to Docling and how it can streamline your AI workflow, and get a chance to walk through a hands on workshop to create your first custom doc ingestion pipeline with Docling. Key features include:

Broad format support: Easily convert PDFs, DOCX, PPTX, HTML, images, and Markdown into structured Markdown or JSON.

Deep document understanding: Accurately capture page layouts, reading order, and tables—essential for complex document analysis.

AI integration: Use the DoclingDocument format with frameworks like LlamaIndex, LangChain, and InstructLab to power RAG, QA, and LLM training.

OCR support: Extract data from scanned or image-based documents.

Developer friendly CLI: Process documents quickly and consistently with a simple command-line interface.

This workshop will require users to have experience with Python programming and LLMs. It will be presented in Jupyter notebook format and will be accessible and runnable in Google collab, ensuring all participants devices will work for the session.

AI Engineer at IBM Research, leading development efforts at the intersection of Artificial Intelligence, Information Retrieval, and Data Management.

Mingxuan Zhao

Open-Source Software Developer and Developer Advocate at IBM

Ming Zhao is an open-source developer and Developer Advocate at IBM Research, where he helps IBM leverage open technologies while building impactful tools and growing vibrant open-source communities. He’s passionate about making open tech accessible to all and ensuring developers have the tools they need to succeed in the rapidly developing AI space. Ming now leads community efforts around Docling, IBM’s fastest-growing open-source project, recently welcomed into the LF AI & Data Foundation.