Scaling Large-Scale Interactive Data Visualization with Accelerated Computing PyData Seattle 2025

Scaling Large-Scale Interactive Data Visualization with Accelerated Computing
.ical
2025-11-09 09:00–10:30, Room 118

As datasets continue to grow in both size and complexity, CPU-based visualization pipelines often become bottlenecks, slowing down exploratory data analysis and interactive dashboards. In this session, we’ll demonstrate how GPU acceleration can transform Python-based interactive visualization workflows, delivering speedups of up to 50x with minimal code changes. Using libraries such as hvPlot, Datashader, cuxfilter, and Plotly Dash, we’ll walk through real-world examples of visualizing both tabular and unstructured data and demonstrate how RAPIDS, a suite of open-source GPU-accelerated data science libraries from NVIDIA, accelerates these workflows. Attendees will learn best practices for accelerating preprocessing, building scalable dashboards, and profiling pipelines to identify and resolve bottlenecks. Whether you are an experienced data scientist or developer, you’ll leave with practical techniques to instantly scale your interactive visualization workflows on GPUs.

Interactive data visualization is essential for interpreting large datasets, helping identify clusters, trends, and anomalies. However, as dataset sizes reach millions or even billions of records, traditional CPU-based pipelines introduce latency that hinders interactive exploration. GPUs provide a practical solution, accelerating both data preprocessing and visualization stages while integrating seamlessly into familiar Python workflows.

The objective of this tutorial is to equip participants with hands-on knowledge of how to:
• integrate GPU-accelerated libraries into existing Python-based interactive visualization pipelines
• process and visualize both structured and unstructured data at scale
• build interactive dashboards capable of rendering large datasets in real time
• profile and optimize pipelines to maximize performance and reduce bottlenecks

Outline
1. Introduction: Challenges of visualizing data at scale
2. hvPlot: Rapid exploratory data analysis
3. Datashader: Massive-scale rendering
4. cuxfilter: Interactive dashboards
5. Plotly Dash: Production-ready applications
6. Comparison and best practices

Knowledge Prerequisite
• basic Python programming skills
• basic knowledge of data operations (e.g. pandas, NumPy)
• familiarity with visualization libraries (Matplotlib, Plotly, etc.) helpful but not required.

Targeted Audience
Data scientists, analysts, developers/engineers interested in large-scale interactive data visualization.

Prior Knowledge Expected: No previous knowledge expected

Allison Ding

Allison Ding is a developer advocate for GPU-accelerated AI APIs, libraries, and tools at NVIDIA, with a specialization in large language models (LLMs) and advanced data science techniques. She brings over nine years of hands-on experience as a data scientist, focusing on managing and delivering end-to-end data science solutions. Her academic background includes a strong emphasis on natural language processing (NLP) and generative AI. Allison holds a master’s degree in Applied Statistics from Cornell University and a master’s degree in Computer Science from San Francisco Bay University.

Scaling Large-Scale Interactive Data Visualization with Accelerated Computing .ical 2025-11-09 09:00–10:30, Room 118

Scaling Large-Scale Interactive Data Visualization with Accelerated Computing
.ical
2025-11-09 09:00–10:30, Room 118