2025-11-07 –, Room 313
This talk will be about the Gaussian process (GP) functionality in the open source Python package PyMC, and how to use GPs effectively for models in the real world. The goal will be to bridge the (wide!) gap between theory and practice, using an example from baseball. By the end of the talk you'll know what's possible in PyMC and how to avoid common pitfalls.
Despite their generality, Gaussian Processes (GPs) aren't commonly used as components in larger models. This talk will explore how the GP functionality in PyMC is designed to change this. I'll also describe some hard-won tricks and tips needed to use GPs effectively in practice.
To reach this goal, we'll first introduce Bayesian methods, PyMC, and hierarchical models. Then, we'll present GPs as a generalization of hierarchical models. PyMC's GP module was designed to make incorporating GPs into existing models straightforward and includes several commonly used approximations. I'll provide a list of situations that can now be handled easily.
Finally, since it's extremely important to carefully set priors on GP hyperparameters for successful modeling, we'll conclude with a discussion of common pitfalls and strategies to overcome them.
The following is an outline of the talk with time estimates.
1. Introduction (3 minutes)
2. Bayesian statistics (3 minutes)
3. Hierarchical models (5 minutes)
4. Introducing GPs as an extension of a hierarchical models (7 minutes)
5. Baseball example (5 minutes)
6. Overview of the GP functionality in PyMC (15 minutes)
7. Tips and tricks (5 minutes)
8. Conclusion (2 minutes)
Bill Engels is a Principal Data Scientist with PyMC Labs, with 10 years of experience in industry and an MS in Statistics from Portland State University. He enjoys all phases of data analysis and is particularly interested in Bayesian modeling and Gaussian processes.