PyData Amsterdam 2025

Grounding LLMs on Solid Knowledge: Assessing and Improving Knowledge Graph Quality in GraphRAG Applications
09-24, 10:50–12:20 (Europe/Amsterdam), Katherine Johnson @ TNW City

Graph-based Retrieval-Augmented Generation (GraphRAG) enhances large language models (LLMs) by grounding their responses in structured knowledge graphs, offering more accurate, domain-specific, and explainable outputs. However, many of the graphs used in these pipelines are automatically generated or loosely assembled, and often lack the semantic structure, consistency, and clarity required for reliable grounding. The result is misleading retrieval, vague or incomplete answers, and hallucinations that are difficult to trace or fix.

This hands-on tutorial introduces a practical approach to evaluating and improving knowledge graph quality in GraphRAG applications. We’ll explore common failure patterns, walk through real-world examples, and share a reusable checklist of features that make a graph “AI-ready.” Participants will learn methods for identifying gaps, inconsistencies, and modeling issues that prevent knowledge graphs from effectively supporting LLMs, and apply simple fixes to improve grounding and retrieval performance in their own projects.


Graph-based Retrieval-Augmented Generation (GraphRAG) enhances large language models (LLMs) by grounding their responses in structured knowledge graphs, offering more accurate, domain-specific, and explainable outputs. However, many of the graphs used in these pipelines are automatically generated or loosely assembled, and often lack the semantic structure, consistency, and clarity required for reliable grounding. The result is misleading retrieval, vague or incomplete answers, and hallucinations that are difficult to trace or fix.

This hands-on tutorial introduces a practical framework for evaluating and improving knowledge graph quality in GraphRAG systems. We combine lecture-based instruction with a coding case study to explore the structural and semantic pitfalls of weak graphs, and demonstrate how to identify and address them.

By the end of the tutorial, participants will be able to:

  • Understand the importance of high-quality KGs in GraphRAG systems and their role in grounding LLM outputs.

  • Identify common problems in LLM-generated or lightly modeled graphs.

  • Apply methods and heuristics for KG quality assessment, including validation, error detection, and refinement techniques.

  • Implement strategies to improve graph structure and content for better AI performance.

Format & Requirements:
- Lecture + live coding walk-through (Jupyter notebooks in Python)
- Materials will be shared via GitHub
- Prior exposure to RAG or knowledge graphs is helpful, but not required

Tentative Outline (90 mins):
- 0–15 min – Introduction to GraphRAG & role of KGs in grounding
- 15–35 min – Common issues in LLM-generated or low-quality graphs
- 35–60 min – KG quality: assessment and improvement techniques
- 60–85 min – Live coding walkthrough: identify and fix issues in a real-world knowledge graph
- 85–90 min – Wrap-up + Q&A

This tutorial is ideal for data scientists, ML engineers, and AI developers looking to build more robust, explainable, and effective GraphRAG systems.