PyData Global 2025

Why Julia's GPU-Accelerated ODE Solvers are 20x-100x Faster than Jax and PyTorch
2025-12-09 , General Track

You may have seen the benchmark results and thought, "how the heck are the Julia ODE solvers on GPUs orders of magnitude faster than the GPU-accelerated Python libraries, that can't be true?" In this talk I will go into detail about the architectural differences between the Julia approaches to generating GPU-accelerated solvers vs the standard ML library approach to GPU usage. By the end of the talk you'll have a good enough understanding of models of GPU acceleration to understand why this performance difference exists, and the many applications that can take advantage of this performance improvement.


This talk is about the results of the publication titled "Automated translation and accelerated solving of differential equations on multiple GPU platforms" which was published in 2024 demonstrating that the Julia GPU-based ODE solvers, specifically DiffEqGPU.jl, are 20x-100x faster than Jax (diffrax) and PyTorch (torchdiffeq). The publication goes into detail as to the architectural reasons for the performance difference, even going as far as recreating the ML style of GPU acceleration in Julia in order to demonstrate that such an approach loses the performance advantage, along with testing against alternative CUDA C++ implementations of a similar form to showcase exactly the effects of the architectural decisions on the resulting performance. However, as a highly technical article it can many times not be as easy to understand as it should. In this talk we're going to give a barebones "no HPC background required" explanation of how the Julia GPU stack enables a completely different approach from the "standard" ML libraries form of GPU acceleration, and how for some applications this can be majorly beneficial. We will note that the GPU design of the ML libraries is actually optimal for ML applications, but certain properties of some applications of ODE solvers make it require a completely different formulation.

We will additionally talk about other projects which have seen similar results, such as solving nonlinear systems in Julia (with NonlinearSolve.jl), GPU-accelerated optimization with Optimization.jl, and new global optimizer methods in ParallelParticleSwarms.jl which all rely on this technique and the special aspects of the Julia GPU infrastructure.

[1] https://www.sciencedirect.com/science/article/abs/pii/S0045782523007156


Prior Knowledge Expected:

No

Dr. Chris Rackauckas is the VP of Modeling and Simulation at JuliaHub, the Director of Scientific Research at Pumas-AI, Co-PI of the Julia Lab at MIT, and the lead developer of the SciML Open Source Software Organization. For his work in mechanistic machine learning, his work is credited for the 15,000x acceleration of NASA Launch Services simulations and recently demonstrated a 60x-570x acceleration over Modelica tools in HVAC simulation, earning Chris the US Air Force Artificial Intelligence Accelerator Scientific Excellence Award. See more at https://chrisrackauckas.com/. He is the lead developer of the Pumas project and received a top presentation award at every ACoP from 2019-2021 for improving methods for uncertainty quantification, automated GPU acceleration of nonlinear mixed effects modeling (NLME), and machine learning assisted construction of NLME models with DeepNLME. For these achievements, Chris received the Emerging Scientist award from ISoP.