PyData Global 2025

Reviving Survival Analysis: Timeless, Yet Overlooked?
2025-12-10 , Analytics, Visualization & Decision Science

Survival analysis tackles one of the oldest and most universal questions in data science: Can we learn from the past when something will happen in the future? I will introduce you to the core concepts of survival analysis, visualize time-to-event datasets with python and R, and introduce pertinent probability distributions. Classical analysis methods for fitting such datasets - some developed long before the age of modern computing - will be confronted to machine-learning approaches. Along the way, surprising paradoxes and counterintuitive results will reveal why survival analysis is not merely a blend of regression and classification, but an important prediction problem of its own.


Since at least 1693, when the first actuarial tables were used for calculating insurance premiums, survival (or "time-to-event") analysis has been relevant for many disciplines. Whether predicting when a mechanical component will fail, when a patient will recover, or when a customer will return a product, survival analysis has applications in nearly every domain - from engineering and medicine to finance and e-commerce. Despite its broad applicability and deep statistical foundations, survival analysis remains underappreciated in modern data science.

I therefore want to give the audience, who does not need to have heard of survival analysis before, an impression about what survival analysis is about, what one needs to be careful with, and which analytical and computational tools to use to get to reliable predictions. In a step-by-step constructive approach, I will slowly guide the audience from the simplest flavor of the fully observed time-to-event-problem to the more intricate versions that include censoring and truncation, in which managing one's own ignorance becomes the most important and challenging aspect. Numerous code examples in python and R will make the talk hands-on, and allow listeners to replicate the numerical experiments and visualizations. At the same time, I will constantly recur to lucid everyday-examples (what age should the house that you buy have so you avoid problems? how long can you use your winter tires on your car? why is milk often still good after the best-before date?) - and thereby hopefully convince the audience: Survival analysis is almost always everywhere.

Outline:

  • Motivation: The oldest problem in data science? [1 min]
  • Introduction: Prediction problems that are in fact survival problems? [3 min]
  • The simple case: Fully observed datasets. Visualization of the cumulative failure distribution. [3 min]
  • The Weibull distribution as the working horse of survival analysis: How to model early failures, constant risks and wear-outs. [4 min]
  • Why reporting another case of illness can be good news. [2 min]
  • Censoring: What can we learn from not having observed anything yet? [2 min]
  • The Kaplan-Meier estimator and the maximum-likelihood principle. [5 min]
  • Machine Learning approaches to the survival problem. [3 min]
  • Outlook: Which degree of individualized survival forecasts can we expect in the future? [2 min]

After the talk, the audience will be able to recognize the time-to-event problem in their own domain, and use the appropriate tools in python and R to analyze and model it.


Prior Knowledge Expected:

No

Malte Tichy has a research background in theoretical quantum physics, with a PhD from
the University of Freiburg. He learned the nuts and bolts of applied data science and forecasting within various hands-on and leadership roles at the supply chain software company Blue Yonder. As a Discipline Expert in Data Analytics & AI, he works on forecasts for wind-turbine component reliability and maintenance expenditures at Siemens Gamesa Renewable Energy.