How to make datamap web-apps of embedding vectors via open source tooling PyData Seattle 2025

How to make datamap web-apps of embedding vectors via open source tooling
.ical
2025-11-09 11:00–12:30, Room 122

Datamaps are ML-powered visualizations of high-dimensional data, and in this talk the data is collections of embedding vectors. Interactive datamaps run in-browser as web-apps, potentially without any code running on the web server. Datamap tech can be used to visualize, say, the entire collection of chunks in a RAG vector database.

The best-of-breed tools of this new datamap technique are liberally licensed open source. This presentation is an introduction to building with those repos. The maths will be mentioned only in passing; the topic here is simply how-to with specific tools. Talk attendees will be learning about Python tools, which produce high-quality web UIs.

DataMapPlot is the premiere tool for rendering a datamap as a web-app. Here is a live demo thereof:
https://connoiter.com/datamap/cff30bc1-0576-44f0-a07c-60456e131b7b

00-25: Intro to datamaps
25-45: Pipeline architecture
45-55: demos touring such tools as UMAP, HDBSCAN, DataMapPlot, Toponomy, etc.
55-90: Group coding

A Google account is required to log in to Google Colab, where participants can run the workshop notebooks. A Hugging Face API key (token) is needed to download Gemma models.

Datamaps are a new visualization technique for high dimensional data that is especially useful when working with embedding vectors, which are proliferating wildly with the success of LLMs and RAG systems.

The datamap conceptual model can be framed via an extended metaphor to real world geo maps -- think Google Maps for high-dimensional data. To wit:

The scene opens on a moonlit 3D landscape viewed as if from a satellite
The data being mapped are represented as points of light scattered across the surface of the landscape, positioned such that points similar to each other in the original high dimensional space are grouped nearby each other on the 3D map
The points are grouped into a tree of clusters
The world starts as a landless water-world but the water recedes
Eventually islands appear, which are the clusters
Continued draining of the landscape leads to cluster agglomeration

Datamap viewer web-apps allow users to navigate within the 3D space to perform exploratory data analysis on the high-dimensional data (read: embedding vectors). That is the mental model but in actuality datamaps are usually rendered in 2D. 3D can be and sometimes is done but navigation can get confusing.

The tech behind the above metaphor:
- The elevation is the probability density
- The placement of points is determined via ML-based dimensionality reduction algorithms (UMAP, t-SNE, etc.)
- The hierarchical clustering is sometime called Topic Modeling. In datamaps, this is usually performed by HDBSCAN and variants (FLASC, etc.)

The tooling is code that implements various topological data analysis (TDA) algorithms designed to work with high dimensional data but the maths aspect will not be covered in any depth; this is all about how to build datamap pipelines with open source tools.

The following open source tools will be covered along with demo code:
- UMAP
- HDBSCAN
- DataMapPlot
- Toponomy
- Vectorizers

The core FOSS tool for making data maps DataMapPlot. Here is an
live demo thereof:
https://connoiter.com/datamap/cff30bc1-0576-44f0-a07c-60456e131b7b

A Google account is required to log in to Google Colab, where participants can run the workshop notebooks.

For more information and the code, visit https://connoiter.com/kingtutte/workshop

Prior Knowledge Expected: No previous knowledge expected

John Tigue

Founder/CTO of Connoiter, producing liberally licensed open source DataMap tooling and driving the effort to have a widely useful DataMap data schema in order to promote interoperability and reduce bit rot.

How to make datamap web-apps of embedding vectors via open source tooling .ical 2025-11-09 11:00–12:30, Room 122

How to make datamap web-apps of embedding vectors via open source tooling
.ical
2025-11-09 11:00–12:30, Room 122