PyData Global 2025

GPU Accelerated Zarr
2025-12-11 , General Track

The zarr-python 3.0 release includes native support for device buffers, enabling Zarr workloads to run on compute accelerators like NVIDIA GPUs. This enables you to get more work done faster.

This talk is primarily intended for people who are at least somewhat familiar with Zarr and are curious about accelerating their n-dimensional array workload with GPUs. That said, we will start with a brief introduction to Zarr and why you might want to consider it as a storage format for the n-dimensional arrays (commonly seen in geospatial, microscopy, or genomics domains, among others). We'll see what factors affect performance and how to maximize throughput for your data analysis or deep learning pipeline. Finally, we'll preview the future improvements to GPU-accelerated Zarr and the packages building on top of it, like xarray and cubed.

After attending this talk, you'll have the knowledge needed to determine if using zarr-python's support for device buffers can help accelerate your workload.


This talk is targeted at users who have at least heard of zarr, but we will give a brief introduction of the basics. The primary purpose is to spread knowledge about zarr-python’s recently added support for device (GPU) buffers and arrays, and how it can be used to speed up your array-based workload.

An outline:

  • Introduction

  • Brief overview of zarr (cloud-native format for storing chunked, n-dimensional arrays)

  • Brief example of how easy it is to use zarr-python’s native support for device arrays

  • Overview of GPU-accelerated Zarr workloads

  • We’ll some high-level examples of how Zarr fits into larger workloads (e.g. analyzing climate simulations, as part of a deep learning pipeline)

  • We’ll discuss the key factors to think about when trying to maximize performance

  • Overview of how it works

  • Show zarr’s configuration options for selecting between host and device buffers
  • An overview of the Zarr codec pipeline
  • Show how on-device decompression can be used, to accelerate decompression if that’s a bottleneck in your workload

  • Benchmarks showing the speedup users can expect to see from GPU acceleration

  • Preview of future work

  • Zarr-python currently only uses a single GPU, and doesn’t use any features like CUDA Streams. https://github.com/zarr-developers/zarr-python/issues/3271 tracks possible improvements for exposing additional parallelism.
  • We’ll look at a prototype of how CUDA streams enable asynchronous host-to-device memory copies, enabling you to start computing on one chunk of data while another chunk is being copied to the device.

Prior Knowledge Expected:

No

I'm a software engineer at NVIDIA working on GPU-accelerated ETL tools as part of the RAPIDS team. I've helped maintain several libraries in the scientific python and geospatial stacks.