PyData Amsterdam 2025

Meet Docling: The “Pandas” for document AI
09-24, 09:00–10:30 (Europe/Amsterdam), Margaret Hamilton @ TNW City

A workshop session to show you the basics on how to use Docling to enhance document ingestion in your AI workflow.


With the rapid rise of AI, developers need better ways to transform complex documents into structured data ready for model training and inference. Enter Docling, an open source Python package that's quickly becoming the go-to for document parsing and export. In just a few months, Docling has earned over 25,000 GitHub stars and is already reshaping how developers approach document AI.

In this session, you'll get an in-depth introduction to Docling and how it can streamline your AI workflow, and get a chance to walk through a hands on workshop to create your first custom doc ingestion pipeline with Docling. Key features include:

Broad format support: Easily convert PDFs, DOCX, PPTX, HTML, images, and Markdown into structured Markdown or JSON.

Deep document understanding: Accurately capture page layouts, reading order, and tables—essential for complex document analysis.

AI integration: Use the DoclingDocument format with frameworks like LlamaIndex, LangChain, and InstructLab to power RAG, QA, and LLM training.

OCR support: Extract data from scanned or image-based documents.

Developer friendly CLI: Process documents quickly and consistently with a simple command-line interface.

This workshop will require users to have experience with Python programming and LLMs. It will be presented in Jupyter notebook format and will be accessible and runnable in Google collab, ensuring all participants devices will work for the session.