PyData Amsterdam 2025

Uncertainty Unleashed: Wrapping Your Predictions in Honesty
09-25, 10:35–11:10 (Europe/Amsterdam), Nebula

There are a lot of models working in production as you're reading this. Lots of them are giving uncalibrated outputs without being explicit on how much one can trust the result. Especially when it comes to imbalanced datasets.

More so, relying on biased estimates can lead to overly aggressive decisions. In this hands‑on talk, we’ll demystify conformal methods using MNIST—the world’s favorite handwritten‑digit playground (to make the talk more fun & interactive)- with two goals in mind: explain & prove what an unbiased guarantee is and how it can be calculated but also why should you care and why does it matter so much. Attendees may leave equipped with: uncertainty guarantee understanding in classification, identify common pitfalls that lead to biased uncertainty estimates, how to apply it (even in difficult contexts like imbalanced datasets - an example will be given).


This talk is centered around estimating uncertainty with unbiased guarantees, especially in imbalanced dataset scenarios. The method proposed to approach this problem is Conformal Prediction. Conventional classifiers give point predictions but rarely communicate their confidence, often leading to overconfident decisions (especially in high-stake domains like Medicine). Conformal prediction fills this gap by producing prediction sets that contain the true label with a user–specified probability (e.g., 90%) without assuming any particular data distribution. Imbalanced datasets make the above a huge hurdle and SMOTE is not of any help.

The proposed talk structure is (but not limited to):

  • What do we mean with guarantees and why is it important? (MNIST example)
  • How the hall can we get out of this?
  • Imbalanced datasets are even more difficult to be dealt with. What now?
  • As a user, how can you estimate unbiased uncertainty guarantees?
  • Remember, Conformal Prediction is a recipe and there are always different ways or ingredients that may fit a use-case better (quick overview over classification focused CP alternatives).

Target audience: Curious people. They may call themselves Practicing Data Scientists, ML Engineers or researchers at times but basic ML understanding would be sufficient to keep up with the talk. No PhD in Stats required <- guaranteed.

This talk is unique among all prior PyData conformal prediction sessions—unlike the time‑series focus at PyData Seattle 2023, the large‑scale forecasting angle at PyData London 2024, the energy‑grid case study at PyData Eindhoven 2023, the MAPIE library deep‑dive at PyData Global 2024, the gentle intro in Amsterdam 2024, the sktime/skpro probabilistic workshop in Amsterdam 2023, the regression‑only focus in London 2019 PyData- as is the first to deliver provably unbiased uncertainty guarantees for general classifiers on a multi‑class playground like MNIST and (more importantly so) in imbalanced datasets.