2025-12-10 –, Machine Learning & AI
In many regulated industries—finance, healthcare, insurance—logistic regression remains the model of choice for its interpretability and regulatory acceptability. Yet capturing non-linear effects and interactions often requires variable binning, and naive approaches (equal-width or quantile cuts) can either wash out signal or invite overfitting. In this 30-minute session, data scientists and risk analysts with a working knowledge of logistic regression and Python will learn to:
-Diagnose the weaknesses of basic binning strategies.
-Select and apply optimal-binning algorithms for different use cases.
-Assess bin stability and guard against model overfit.
All code, data samples, and a notebook will be available on GitHub.
Despite the rise of complex “black-box” models, regulated environments still demand transparency. Properly binned variables not only improve model fit but also yield coefficients that the business and auditors can interpret. However, determining cut-points that preserve true signal while avoiding data-snooping bias is non-trivial.
By the end of this session, attendees will be able to:
- Understand the basic idea behind binning (the what)
- To know in which contexts variable binning makes sense (the when and why).
- Choose among popular optimal-binning techniques (e.g., ChiMerge, MDLP, decision-tree-based) based on data size, feature type, and operational constraints (the how).
Who Should Attend?
Data scientists and risk analysts who use logistic regression in regulated settings and need a reproducible, explainable feature-engineering pipeline.
Detailed 30-Minute Agenda
| Time | Topic |
|---|---|
| 0–3 min | Context & Why Binning Matters in explainibility |
| 3–8 min | Pitfalls of Naïve Binning (examples from real-life) |
| 8–18 min | Binning as an optimization problem : Algorithms & Decision Criteria |
| 18–26 min | Hands-On Python Demo: From Data to Defensible Bins |
| 26–30 min | Q&A, Resources & Next Steps |
Prerequisites & Materials
- Prerequisites: Basic Python (pandas, scikit-learn) and logistic-regression familiarity
- Materials: GitHub repo with notebook, data samples, will be shared during the talk
You’ll leave equipped to choose the right optimal‐binning algorithm for your data.
Quantitative Finance and Econometrics Gradutate from Sorbonne's University. Currently working as Data Scientist at BNP Paribas & as lecturer at Sorbonne's University.