PyData London 2025

Michał Szołucha

During his work at NVIDIA, Michał gained vast experience in Deep Learning Software Development. He tackled challenges in training and inference, ranging from small-scale to large-scale applications, as well as user-facing tasks and highly-optimized benchmarks like MLPerf. Michał also possesses a deep understanding of data loading problems, having worked as a developer on NVIDIA DALI, the Data Loading Library.


Sessions

06-07
10:20
45min
Parallel PyTorch Inference with Python Free-Threading
Michał Szołucha

This talk examines multi-threaded parallel inference on PyTorch models using the new No-GIL, free-threaded version of Python. Using a simple 124M parameter GPT2 model that we train from scratch, we explore the novel new territory unlocked by free-threaded Python: parallel PyTorch model inference, where multiple threads, unimpeded by the Python GIL, attempt to generate text from a transformer-based model in parallel.

Hardwick Hub