2025-11-07 –, Room 313
Generalized Additive Models (GAMs)
Generalized Additive Models (GAMs) strike a rare balance: they combine the flexibility of complex models with the clarity of simple ones.
They often achieve performance comparable to black-box models, yet remain:
- Easy to interpret
- Computationally efficient
- Aligned with the growing demand for transparency in AI
With recent U.S. AI regulations (White House, 2022) and increasing pressure from decision-makers for explainable models, GAMs are emerging as a natural choice across industries.
Audience
This guide is for readers with some background in Python and statistics, including:
- Data scientists
- Machine learning engineers
- Researchers
Takeaway
By the end, you’ll understand:
- The intuition behind GAMs
- How to build and apply them in practice
- How to interpret and explain GAM predictions and results in Python
Prerequisites
You should be comfortable with:
- Basic regression concepts
- Model regularization
- The bias–variance trade-off
- Python programming
Why GAMs Matter
In machine learning, practitioners often face a trade-off:
- Simple models (e.g., linear or logistic regression) are transparent but too rigid and risk underfitting.
- Black-box models (e.g., deep neural networks, GANs, XGBoost) are powerful, but costly, opaque, and often difficult to trust. Their credibility primarily stems from empirical testing, which may not translate into business interpretability or regulatory compliance.
Generalized Additive Models (GAMs) resolve this tension.
They allow features to interact with the target in flexible, nonlinear ways—without losing interpretability and while achieving performance comparable to more complex models, especially on structured/tabular data.
Key Advantages
- Performance: Research (Hastie & Tibshirani, 1990; Lou, Caruana, & Gehrke, 2012) and empirical studies show that GAMs often rival tree-based and boosting methods. For example, StitchFix demonstrated that GAMs achieved nearly the same AUROC as random forests in customer acquisition, but with much lower scoring times.
- Industry adoption: Companies increasingly adopt GAMs, as the marginal accuracy gains of black-box models rarely justify their costs. Transparent models reduce computational overhead and simplify validation pipelines.
- Regulation & Trust: In regulated domains like healthcare and finance, interpretability is a requirement. The U.S. AI Bill of Rights (2022) and standards like the NIST AI RMF emphasize transparency, fairness, and accountability in AI. GAMs offer practitioners a practical path to align with these standards while preserving predictive power.
Supporting Evidence
- 📊 StitchFix (Customer Acquisition): GAMs matched random forests in predictive performance while requiring a fraction of the scoring time, making them far more deployable.
- ⏱ Forecasting (SeasonalNaive vs. LagLlama): Simple SeasonalNaive models outperformed LagLlama (a deep foundational model) by 42% in accuracy and 1000× in speed, underscoring how interpretable, computationally efficient models can surpass state-of-the-art approaches.
- ✅ Trustworthy AI Standards (NIST AI RMF, ISO/IEC 23894, ISO/IEC 42001): These frameworks stress explainability, robustness, fairness, and accountability as cornerstones of trustworthy AI. GAMs inherently support these values by being interpretable, auditable, and easier to govern compared to opaque architectures.
Together, these findings reinforce that interpretable ≠ weak. GAMs and similar models demonstrate that simplicity can coexist with power, compliance, and efficiency—making them the responsible choice for modern AI.
Outline & Time Breakdown
- 0–10 min: Setting the Stage
- The trade-off: simple vs. complex models
- Model intuition
Real-world examples of why interpretability matters
10–20 min: Understanding GAMs
- The math (intuitively explained)
- Tools in Python:
pyGAM,statsmodels,pyro Smooth functions, splines, and additive structures
20–30 min: Hands-On Examples
- Building a GAM in Python
- Benchmarking against logistic regression & random forests
Visualizing terms for interpretability
30–40 min: Applications & Case Studies
- Healthcare: risk prediction with trust
- Finance: credit scoring with compliance
Business: churn modeling with interpretability
40–45 min: Limitations & What’s Next
- When not to use GAMs
- Extensions: Explainable Boosting Machines (EBMs)
Open questions in interpretable ML
45–50 min: Q&A
References
- Hastie, T., & Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall.
- Lou, Y., Caruana, R., & Gehrke, J. (2012). Intelligible Models for Classification and Regression. KDD.
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer.
- Caruana, R. et al. (2015). Intelligible Models with Pairwise Interactions. KDD.
- White House (2022). Blueprint for an AI Bill of Rights. Link
- Larsen, K. (StitchFix). GAM vs. Random Forest Performance in Direct Mail Customer Acquisition. GitHub
- Nixtla AI (2023). SeasonalNaive vs. LagLlama: Large-Scale Forecasting Benchmark. arXiv
- Nannapaneni, S. (2025). Trustworthy and Responsible AI Modeling. AppOrchid Inc.
- Wood, S. N. (2017). Generalized Additive Models: An Introduction with R, Second Edition. CRC Press. ISBN: 1498728332 | ISBN13: 9781498728331.
Hi everyone — I’m Pedro Albuquerque, Principal Data Scientist at AppOrchid. I work where machine learning, econometrics, and applied research meet, with a big focus on interpretable, trustworthy AI. Over the past 15+ years I’ve built and shipped data products in industry (AppOrchid, FleetOps, Convoy, ServiceNow/ElementAI) and stayed active in academia (2,000+ citations, multiple peer-reviewed papers). I’ve also taught in math, CS, and business departments and founded a lab that mentored 30+ students on ML for finance, business, and social impact.