Optimal Variable Binning in Logistic Regression PyData Global 2025

Optimal Variable Binning in Logistic Regression
.ical
2025-12-10 13:30–14:00, Machine Learning & AI

In many regulated industries—finance, healthcare, insurance—logistic regression remains the model of choice for its interpretability and regulatory acceptability. Yet capturing non-linear effects and interactions often requires variable binning, and naive approaches (equal-width or quantile cuts) can either wash out signal or invite overfitting. In this 30-minute session, data scientists and risk analysts with a working knowledge of logistic regression and Python will learn to:

-Diagnose the weaknesses of basic binning strategies.
-Select and apply optimal-binning algorithms for different use cases.
-Assess bin stability and guard against model overfit.

All code, data samples, and a notebook will be available on GitHub.

Despite the rise of complex “black-box” models, regulated environments still demand transparency. Properly binned variables not only improve model fit but also yield coefficients that the business and auditors can interpret. However, determining cut-points that preserve true signal while avoiding data-snooping bias is non-trivial.

By the end of this session, attendees will be able to:

Understand the basic idea behind binning (the what)
To know in which contexts variable binning makes sense (the when and why).
Choose among popular optimal-binning techniques (e.g., ChiMerge, MDLP, decision-tree-based) based on data size, feature type, and operational constraints (the how).

Who Should Attend?

Data scientists and risk analysts who use logistic regression in regulated settings and need a reproducible, explainable feature-engineering pipeline.

Detailed 30-Minute Agenda

Time	Topic
0–3 min	Context & Why Binning Matters in explainibility
3–8 min	Pitfalls of Naïve Binning (examples from real-life)
8–18 min	Binning as an optimization problem : Algorithms & Decision Criteria
18–26 min	Hands-On Python Demo: From Data to Defensible Bins
26–30 min	Q&A, Resources & Next Steps

Prerequisites & Materials

Prerequisites: Basic Python (pandas, scikit-learn) and logistic-regression familiarity
Materials: GitHub repo with notebook, data samples, will be shared during the talk

You’ll leave equipped to choose the right optimal‐binning algorithm for your data.

Prior Knowledge Expected: No

Charaf ZGUIOUAR

Quantitative Finance and Econometrics Gradutate from Sorbonne's University. Currently working as Data Scientist at BNP Paribas & as lecturer at Sorbonne's University.

Optimal Variable Binning in Logistic Regression .ical 2025-12-10 13:30–14:00, Machine Learning & AI

Optimal Variable Binning in Logistic Regression
.ical
2025-12-10 13:30–14:00, Machine Learning & AI