PyData Global 2025

Optimal Variable Binning in Logistic Regression
2025-12-10 , Machine Learning & AI

In many regulated industries—finance, healthcare, insurance—logistic regression remains the model of choice for its interpretability and regulatory acceptability. Yet capturing non-linear effects and interactions often requires variable binning, and naive approaches (equal-width or quantile cuts) can either wash out signal or invite overfitting. In this 30-minute session, data scientists and risk analysts with a working knowledge of logistic regression and Python will learn to:

-Diagnose the weaknesses of basic binning strategies.
-Select and apply optimal-binning algorithms for different use cases.
-Assess bin stability and guard against model overfit.

All code, data samples, and a turnkey notebook will be available on GitHub, so you can start experimenting immediately.


Despite the rise of complex “black-box” models, regulated environments still demand transparency. Properly binned variables not only improve model fit but also yield coefficients that underwriters and auditors can interpret. However, determining cut-points that preserve true signal while avoiding data-snooping bias is non-trivial.

By the end of this session, attendees will be able to:

  • Understand the basic idea behind binning (the what)
  • To know in which contexts variable binning makes sense (the when and why).
  • Choose among popular optimal-binning techniques (e.g., ChiMerge, MDLP, decision-tree-based) based on data size, feature type, and operational constraints (the how).

Who Should Attend?

Data scientists, ML engineers, and risk analysts who use logistic regression in regulated settings and need a reproducible, explainable feature-engineering pipeline.

Detailed 30-Minute Agenda

TimeTopic
0–3 minContext & Why Binning Matters in explainibility
3–8 minPitfalls of Naïve Binning (examples from real-life)
8–18 minBinning as an optimization problem : Algorithms & Decision Criteria
18–26 minHands-On Python Demo: From Data to Defensible Bins
26–30 minQ&A, Resources & Next Steps

Prerequisites & Materials

  • Prerequisites: Basic Python (pandas, scikit-learn) and logistic-regression familiarity
  • Materials: GitHub repo with notebook, data samples, will be shared during the talk

You’ll leave equipped to choose the right optimal‐binning algorithm for your data and apply industry-proven best practices.


Prior Knowledge Expected:

No

Quantitative Finance and Econometrics Gradutate from Sorbonne's University. Currently working as Data Scientist at BNP Paribas & as lecturer at Sorbonne's University.