PyData Seattle 2025

Generalized Additive Models: Explainability Strikes Back
2025-11-07 , Talk Track 3

Generalized Additive Models (GAMs)

Generalized Additive Models (GAMs) strike a rare balance: they combine the flexibility of complex models with the clarity of simple ones.

They often achieve performance comparable to black-box models, yet remain:
- Easy to interpret
- Computationally efficient
- Aligned with the growing demand for transparency in AI

With recent U.S. AI regulations (White House, 2022) and increasing pressure from decision-makers for explainable models, GAMs are emerging as a natural choice across industries.


Audience

This guide is for readers with some background in Python and statistics, including:
- Data scientists
- Machine learning engineers
- Researchers


Takeaway

By the end, you’ll understand:
- The intuition behind GAMs
- How to build and apply them in practice
- How to interpret and explain GAM predictions and results in Python


Prerequisites

You should be comfortable with:
- Basic regression concepts
- Model regularization
- The bias–variance trade-off
- Python programming


Why GAMs Matter

In machine learning, practitioners often face a trade-off:

  • Simple models (e.g., linear or logistic regression) are transparent but too rigid and risk underfitting.
  • Black-box models (e.g., deep neural networks, GANs, XGBoost) are powerful, but costly, opaque, and often difficult to trust. Their credibility primarily stems from empirical testing, which may not translate into business interpretability or regulatory compliance.

Generalized Additive Models (GAMs) resolve this tension.
They allow features to interact with the target in flexible, nonlinear ways—without losing interpretability and while achieving performance comparable to more complex models, especially on structured/tabular data.

Key Advantages

  • Performance: Research (Hastie & Tibshirani, 1990; Lou, Caruana, & Gehrke, 2012) and empirical studies show that GAMs often rival tree-based and boosting methods. For example, StitchFix demonstrated that GAMs achieved nearly the same AUROC as random forests in customer acquisition, but with much lower scoring times.
  • Industry adoption: Companies increasingly adopt GAMs, as the marginal accuracy gains of black-box models rarely justify their costs. Transparent models reduce computational overhead and simplify validation pipelines.
  • Regulation & Trust: In regulated domains like healthcare and finance, interpretability is a requirement. The U.S. AI Bill of Rights (2022) and standards like the NIST AI RMF emphasize transparency, fairness, and accountability in AI. GAMs offer practitioners a practical path to align with these standards while preserving predictive power.

Supporting Evidence

  • 📊 StitchFix (Customer Acquisition): GAMs matched random forests in predictive performance while requiring a fraction of the scoring time, making them far more deployable.
  • Forecasting (SeasonalNaive vs. LagLlama): Simple SeasonalNaive models outperformed LagLlama (a deep foundational model) by 42% in accuracy and 1000× in speed, underscoring how interpretable, computationally efficient models can surpass state-of-the-art approaches.
  • Trustworthy AI Standards (NIST AI RMF, ISO/IEC 23894, ISO/IEC 42001): These frameworks stress explainability, robustness, fairness, and accountability as cornerstones of trustworthy AI. GAMs inherently support these values by being interpretable, auditable, and easier to govern compared to opaque architectures.

Together, these findings reinforce that interpretable ≠ weak. GAMs and similar models demonstrate that simplicity can coexist with power, compliance, and efficiency—making them the responsible choice for modern AI.


Outline & Time Breakdown

  • 0–10 min: Setting the Stage
  • The trade-off: simple vs. complex models
  • Model intuition
  • Real-world examples of why interpretability matters

  • 10–20 min: Understanding GAMs

  • The math (intuitively explained)
  • Tools in Python: pyGAM, statsmodels, pyro
  • Smooth functions, splines, and additive structures

  • 20–30 min: Hands-On Examples

  • Building a GAM in Python
  • Benchmarking against logistic regression & random forests
  • Visualizing terms for interpretability

  • 30–40 min: Applications & Case Studies

  • Healthcare: risk prediction with trust
  • Finance: credit scoring with compliance
  • Business: churn modeling with interpretability

  • 40–45 min: Limitations & What’s Next

  • When not to use GAMs
  • Extensions: Explainable Boosting Machines (EBMs)
  • Open questions in interpretable ML

  • 45–50 min: Q&A


References

  • Hastie, T., & Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall.
  • Lou, Y., Caruana, R., & Gehrke, J. (2012). Intelligible Models for Classification and Regression. KDD.
  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer.
  • Caruana, R. et al. (2015). Intelligible Models with Pairwise Interactions. KDD.
  • White House (2022). Blueprint for an AI Bill of Rights. Link
  • Larsen, K. (StitchFix). GAM vs. Random Forest Performance in Direct Mail Customer Acquisition. GitHub
  • Nixtla AI (2023). SeasonalNaive vs. LagLlama: Large-Scale Forecasting Benchmark. arXiv
  • Nannapaneni, S. (2025). Trustworthy and Responsible AI Modeling. AppOrchid Inc.
  • Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition. CRC Press. ISBN: 1498728332 | ISBN13: 9781498728331.

Prior Knowledge Expected:

Previous knowledge expected

Dear Program Committee,

I am currently a Principal Data Scientist at AppOrchid, where I lead projects at the intersection of machine learning, econometrics, and applied research, with a strong focus on interpretable and trustworthy AI. Over the past 15+ years, I have built a career bridging industry and academia, delivering data-driven solutions at organizations such as FleetOps, Convoy, and ServiceNow (ElementAI). My academic contributions include 2,000+ citations and multiple peer-reviewed publications (Google Scholar profile
).

As an Associate Professor, I taught in the Mathematics, Computer Science, and Business departments, designing and delivering courses in econometrics, statistical inference, and operational research. I also founded the Laboratory of Machine Learning in Finance and Organizations, mentoring more than 30 students and researchers on projects applying ML to finance, business, and social impact.

Beyond research and teaching, I am an experienced speaker and educator, known for communicating complex ideas in clear and engaging ways. Across conferences, lectures, and industry events, I have consistently emphasized explainability, transparency, and practical impact—principles that directly align with the growing demand for trustworthy AI.

With the rise of regulatory frameworks such as the U.S. AI Bill of Rights (2022) and the NIST AI Risk Management Framework, the need for interpretable models like Generalized Additive Models (GAMs) has never been greater. My session will demonstrate how GAMs provide a rare balance of performance, interpretability, and compliance, supported by real-world case studies and hands-on examples in Python.

I believe my background uniquely positions me to deliver a session that is both technically rigorous and directly relevant to today’s regulatory, business, and academic landscapes.

Sincerely,
Pedro Henrique Melo Albuquerque