PyData Virginia 2025

Getting Started with RAPIDS: GPU-Accelerated Data Science for PyData Users
04-19, 13:30–15:00 (US/Eastern), Room 130

In this introductory hands-on tutorial, participants will learn how to accelerate their data workflows with RAPIDS, an open-source suite of libraries designed to leverage the power of NVIDIA GPUs for end-to-end data pipelines. Using familiar PyData APIs like cuDF (GPU-accelerated pandas) and cuML (GPU-accelerated machine learning), attendees will explore how to seamlessly integrate these tools into their existing workflows with minimal code changes, achieving significant speedups in tasks such as data processing and model training.


NVIDIA GPUs offer unmatched speed and efficiency for data processing and model training, significantly reducing the time and cost associated with these tasks. The appeal of GPUs becomes even stronger with zero-code-change libraries and plugins, allowing you to take advantage of GPU acceleration without having to rewrite your existing code. With RAPIDS, you can use popular PyData libraries like pandas, polars, and networkx while reaping the performance benefits of GPUs.

This tutorial provides an introduction to RAPIDS, an open-source suite of libraries that accelerates data science and machine learning workflows using GPU technology. Aimed at data scientists and machine learning practitioners of all experience levels, the session will focus on how RAPIDS can be seamlessly integrated into existing data pipelines to achieve substantial performance improvements with minimal code changes.

Through hands-on coding exercises, attendees will explore the RAPIDS ecosystem, including cuDF (GPU-accelerated pandas) and cuML (GPU-accelerated machine learning), and learn how to integrate these tools into their workflows to accelerate tasks like data processing and model training. By the end of this tutorial, they'll understand how RAPIDS integrates with the PyData ecosystem and significantly speed up workflows,

The target audience for this tutorial is data scientists and machine learning practitioners. No prior GPU knowledge is required, but participants should have some experience with Python, pandas, and scikit-learn.


Prior Knowledge Expected

No previous knowledge expected

Naty is a Senior Software Engineer at NVIDIA. She is a former academic with a Masters in Physics and PhD in Mechanical and Aerospace Engineering to her name. She is currently contributing to RAPIDS, but in the past has also contributed and maintained other open source projects such as Ibis and Dask. She is also an active member of PyLadies and an active volunteer and organizer of Women and Gender Expansive Coders DC meetups.

Mike is a Senior Software Engineering Manager at NVIDIA working on RAPIDS where he manages teams working on RAPIDS Cloud and HPC deployments, build infrastructure and packaging, and PyData projects. He has also contributed to open source software projects in the PyData ecosystem such as Dask and Intake. He holds two bachelor’s degrees in computer science and physics, and has over 20 years of experience in software engineering and scientific computing in astronomy, computational sciences, data science, machine learning, and enterprise products.