John Tigue
Founder/CTO of Connoiter, producing liberally licensed open source DataMap tooling and driving the effort to have a widely useful DataMap data schema in order to promote interoperability and reduce bit rot.
Session
Datamaps are ML-powered visualizations of high-dimensional data, and in this talk the data is collections of embedding vectors. Interactive datamaps run in-browser as web-apps, potentially without any code running on the web server. Datamap tech can be used to visualize, say, the entire collection of chunks in a RAG vector database.
The best-of-breed tools of this new datamap technique are liberally licensed open source. This presentation is an introduction to building with those repos. The maths will be mentioned only in passing; the topic here is simply how-to with specific tools. Talk attendees will be learning about Python tools, which produce high-quality web UIs.
DataMapPlot is the premiere tool for rendering a datamap as a web-app. Here is a live demo thereof:
https://connoiter.com/datamap/cff30bc1-0576-44f0-a07c-60456e131b7b
00-25: Intro to datamaps
25-45: Pipeline architecture
45-55: demos touring such tools as UMAP, HDBSCAN, DataMapPlot, Toponomy, etc.
55-90: Group coding
A Google account is required to log in to Google Colab, where participants can run the workshop notebooks. A Hugging Face API key (token) is needed to download Gemma models.