2025-12-10 –, Machine Learning & AI
In many regulated industries—finance, healthcare, insurance—logistic regression remains the model of choice for its interpretability and regulatory acceptability. Yet capturing non-linear effects and interactions often requires variable binning, and naive approaches (equal-width or quantile cuts) can either wash out signal or invite overfitting. In this 30-minute session, data scientists and risk analysts with a working knowledge of logistic regression and Python will learn to:
-Diagnose the weaknesses of basic binning strategies.
-Select and apply optimal-binning algorithms for different use cases.
-Assess bin stability and guard against model overfit.
All code, data samples, and a turnkey notebook will be available on GitHub, so you can start experimenting immediately.
Despite the rise of complex “black-box” models, regulated environments still demand transparency. Properly binned variables not only improve model fit but also yield coefficients that underwriters and auditors can interpret. However, determining cut-points that preserve true signal while avoiding data-snooping bias is non-trivial.
By the end of this session, attendees will be able to:
- Understand the basic idea behind binning (the what)
- To know in which contexts variable binning makes sense (the when and why).
- Choose among popular optimal-binning techniques (e.g., ChiMerge, MDLP, decision-tree-based) based on data size, feature type, and operational constraints (the how).
Who Should Attend?
Data scientists, ML engineers, and risk analysts who use logistic regression in regulated settings and need a reproducible, explainable feature-engineering pipeline.
Detailed 30-Minute Agenda
Time | Topic |
---|---|
0–3 min | Context & Why Binning Matters in explainibility |
3–8 min | Pitfalls of Naïve Binning (examples from real-life) |
8–18 min | Binning as an optimization problem : Algorithms & Decision Criteria |
18–26 min | Hands-On Python Demo: From Data to Defensible Bins |
26–30 min | Q&A, Resources & Next Steps |
Prerequisites & Materials
- Prerequisites: Basic Python (pandas, scikit-learn) and logistic-regression familiarity
- Materials: GitHub repo with notebook, data samples, will be shared during the talk
You’ll leave equipped to choose the right optimal‐binning algorithm for your data and apply industry-proven best practices.
No
Quantitative Finance and Econometrics Gradutate from Sorbonne's University. Currently working as Data Scientist at BNP Paribas & as lecturer at Sorbonne's University.