09-26, 13:25–14:00 (Europe/Amsterdam), Voyager
Abstract
As a data visualization practitioner, I frequently draw inspiration from the diverse and rapidly expanding community, particularly through challenges like #TidyTuesday. However, the sheer volume of remarkable visualizations quickly overwhelmed my manual curation methods—from Pinterest boards to Notion pages. This created a significant bottleneck in my workflow, as I found myself spending more time cataloging charts than actively creating them.
In this talk, I will present a RAG (Retrieval Augmented Generation) based retrieval system that I designed specifically for data visualizations. I will detail the methodology behind this system, illustrating how I addressed my own workflow inefficiencies by transforming a dispersed collection of charts into a semantically searchable knowledge base. This project serves as a practical example of applying advanced AI techniques to enhance creative technical work, demonstrating how a specialized retrieval system can significantly improve the efficiency and quality of data visualization creation process.
Description
My professional work revolves around Python, but my creative passion lies in crafting data visualizations with R and ggplot2, where I regularly participate in community challenges like #TidyTuesday and #30DayChartChallenge.
Like many visualization creators, I draw inspiration from the community's collective work—studying their geometric choices, color palettes, thematic elements, and annotation techniques. I particularly learn from examining contributor code on GitHub for #TidyTuesday challenges. Over time, I've tried various approaches to organize this inspiration: Pinterest boards, manual tagging in Notion pages, Twitter bookmarks, and screenshots. However, the sheer volume of remarkable visualizations being produced has overwhelmed my manual curation processes, and I realized I'm spending more time cataloging charts than actually creating them.
The limitations of my conventional organization methods became clear because:
- They required tedious manual tagging and organization.
- They lacked any semantic search capabilities to find visualizations.
- They existed in disconnected silos, making comprehensive searches impossible.
- They could not scale with the exponentially growing visualization community.
As an AI engineer by day, I realized this challenge was a perfect use case for a RAG-based retrieval system. By creating an intelligent search tool specifically designed for data visualizations, I could transform my scattered collection of inspiration into a searchable knowledge base. Instead of scrolling through endless saved charts or GitHub repositories, I could simply search with natural language queries like, "I'm looking for a geofaceted map of Europe on a dark background," or, "Show me circular charts that use a sans-serif font in their title."
Implementing RAG for this domain proved particularly challenging because I'm dealing primarily with images. Simply embedding visual features from visualizations didn't yield satisfactory results. I needed to develop a hybrid approach that effectively processes both textual elements (code, descriptions, titles) and visual components. In this talk, I'll present my methodology for distilling data visualizations into searchable embeddings that capture both visual characteristics and conceptual elements.
This project brings together AI and data visualization—two areas that rarely meet. Though it uses RAG technology, the main goal is to help create better visualizations more easily. What began as fixing my own workflow problem could benefit many others in the visualization community. It's a practical example of how different technologies and tools can significantly improve the efficiency and quality of the data visualization creation process.
Time Breakdown
- My Personal Struggle with Viz Inspiration (5 min): Setting the stage by discussing the challenge of managing diverse data visualization inspiration, the overwhelming volume of community-shared data visualizations, and the limitations of manual curation.
- Deconstructing Visualizations for AI (7 min): How we can break down data visualizations into searchable components, drawing parallels with the "Grammar of Graphics" to inform our AI-driven approach.
- Building a Hybrid RAG System: Strategies & Challenges (8 min): Dive into the practicalities of creating a RAG system for visual content, focusing on the hybrid embedding techniques that combine image and text features and the hurdles faced.
- Live Demo: Searching for My Next Chart (5 min): A live demonstration of the visualization search application in action, showcasing natural language queries and retrieved results.
- Key Takeaways & Future Directions (5 min): Summarizing the main learnings, discussing future development ideas, and broader applications of this AI-powered approach.
Target Audience
This talk is ideal for data visualization practitioners or creatives who are looking for smart ways to organize and improve their workflows, or AI engineers exploring practical applications of RAG systems on visual data.
Specialized Tracks
- Data Science & Analytics
- Machine Learning & Deep Learning