PyData Berlin 2025

Sankalp Gilda

As a Staff MLE, Sankalp gets fired up by complex technical challenges, diving deep into time series, constrained optimization, and high-performance computing. He's currently exploring the practical frontier of Generative AI, applying LLMs and multimodal techniques to improve how knowledge graphs are built from diverse sources. This talk focuses on a crucial component of that work: efficiently mapping and aligning extracted concepts to standard knowledge bases like Wikidata. Off-duty, his adventures shift from algorithmic to atmospheric (skydiving) and aquatic (scuba diving), often accompanied by his adventure-loving dog.


Session

09-02
15:00
30min
Bridging Custom Schemas and Wikidata with an LLM-Assisted Interactive Python Tool
Sankalp Gilda

Many projects build knowledge graphs with custom schemas but struggle to align them with standard hubs like Wikidata. Manual mapping is tedious and error-prone, while fully automated methods often lack accuracy. This talk introduces wikidata-mapper, a Python tool leveraging Large Language Models (LLMs via DSPy) to suggest semantic mappings between simple YAML ontology schemas and Wikidata identifiers (QIDs/PIDs). We demonstrate its interactive workflow, including confidence-based auto-acceptance, batch suggestion/review modes for scalability, and a novel hierarchy suggestion feature. Learn how this tool combines LLM power with human oversight to efficiently ground custom knowledge representations in Wikidata, using libraries like inquirer, tenacity, and platformdirs. Ideal for KG practitioners, data engineers, and anyone needing to integrate custom schemas with public knowledge bases.

Natural Language Processing & Audio (incl. Generative AI NLP)
B07-B08