2025-12-11 –, General Track
The zarr-python 3.0 release includes native support for device buffers, enabling Zarr workloads to run on compute accelerators like NVIDIA GPUs. This enables you to get more work done faster.
This talk is primarily intended for people who are at least somewhat familiar with Zarr and are curious about accelerating their n-dimensional array workload with GPUs. That said, we will start with a brief introduction to Zarr and why you might want to consider it as a storage format for the n-dimensional arrays (commonly seen in geospatial, microscopy, or genomics domains, among others). We'll see what factors affect performance and how to maximize throughput for your data analysis or deep learning pipeline. Finally, we'll preview the future improvements to GPU-accelerated Zarr and the packages building on top of it, like xarray and cubed.
After attending this talk, you'll have the knowledge needed to determine if using zarr-python's support for device buffers can help accelerate your workload.
This talk is targeted at users who have at least heard of zarr, but we will give a brief introduction of the basics. The primary purpose is to spread knowledge about zarr-python’s recently added support for device (GPU) buffers and arrays, and how it can be used to speed up your array-based workload.
An outline:
Introduction
Brief overview of zarr (cloud-native format for storing chunked, n-dimensional arrays)
Brief example of how easy it is to use zarr-python’s native support for device arrays
Overview of GPU-accelerated Zarr workloads
We’ll some high-level examples of how Zarr fits into larger workloads (e.g. analyzing climate simulations, as part of a deep learning pipeline)
We’ll discuss the key factors to think about when trying to maximize performance
Overview of how it works
- Show zarr’s configuration options for selecting between host and device buffers
- An overview of the Zarr codec pipeline
Show how on-device decompression can be used, to accelerate decompression if that’s a bottleneck in your workload
Benchmarks showing the speedup users can expect to see from GPU acceleration
Preview of future work
- Zarr-python currently only uses a single GPU, and doesn’t use any features like CUDA Streams. https://github.com/zarr-developers/zarr-python/issues/3271 tracks possible improvements for exposing additional parallelism.
- We’ll look at a prototype of how CUDA streams enable asynchronous host-to-device memory copies, enabling you to start computing on one chunk of data while another chunk is being copied to the device.
No
I'm a software engineer at NVIDIA working on GPU-accelerated ETL tools as part of the RAPIDS team. I've helped maintain several libraries in the scientific python and geospatial stacks.