Building a knowledge graph for climate policy PyData London 2025

Building a knowledge graph for climate policy
.ical

06-08, 15:30–16:15 (Europe/London), Hardwick Hub

At Climate Policy Radar, we're building an open-source knowledge graph for climate policy. In this talk, we'll share how we combine in-house expertise with scalable data infrastructure to identify key concepts in thousands of global climate policy documents. We'll also touch on ontology design, equitable evaluation, and the climate impacts of AI.

We'll take you on a technical deep-dive into how we've built and scaled a knowledge graph which maps the relationships between thousands of climate policy concepts, and identifies where those concepts appear in our corpus of climate policy and other climate-relevant documents.

We'll share the high-level methodology, infrastructure decisions, and evaluation framework which have allowed our small team to process millions of passages of text while maintaining high standards for fairness and accuracy.

After covering the basics of what a knowledge graph is, and why you might want to build one, we'll cover:

Knowledge Graph Architecture & Methodology
- An ontology which can handle the complexity of the climate policy domain
- Interoperability considerations with existing sub-domain taxonomies
- Why we're building in the open with Wikibase
- The value of real human expertise
Classifier Development & Evaluation
- A common model for classifiers, which can encompass a range of architectures from straightforward regexes, to fine-tuned BERT-based models, to optimised calls to third-party LLMs
- Sampling strategies for building representative evaluation datasets
- Quantitative metrics vs qualitative vibe-checks for classifier selection
Production Infrastructure & Scaling
- A modular pipeline design separating model management, inference, and indexing
- Prefect-based orchestration for distributed inference
- Infrastructure as code with Pulumi
- Planned integration with our existing search and RAG systems

The audience should leave the talk with a clear understanding of:

Practical considerations when building domain-specific, high-impact knowledge graphs
Methods for evaluating NLP classifier performance in technical domains
Approaches to scaling inference pipelines, from local experimentation to routine cloud-based deployments
How we plan to use our knowledge graph to power a climate policy research platform, including integrations with RAG and other LLM-driven systems

This talk should be particularly stimulating for data scientists and engineers working on information retrieval systems, knowledge graphs, or other high-impact natural language processing systems.

Prior Knowledge Expected –

No previous knowledge expected

Harrison Pim

I'm a data scientist / machine learning engineer with a background in computational / quantum physics. I write loads of python and typescript, and a little bit of everything else.

I like working on hard R&D problems involving computer vision, natural language processing, graph theory, representation learning, recommendation systems, and information retrieval.

I love turning those research projects into end-to-end pipelines and services which help people in the real world.

Fred O'Loughlin

Building a knowledge graph for climate policy .ical 06-08, 15:30–16:15 (Europe/London), Hardwick Hub

Building a knowledge graph for climate policy
.ical

06-08, 15:30–16:15 (Europe/London), Hardwick Hub