Gotta catch ‘em all - Hunting Fraudsters with Minimal Labels and Maximum ML :: PyData Amsterdam 2025

Gotta catch ‘em all - Hunting Fraudsters with Minimal Labels and Maximum ML
.ical

09-25, 14:40–15:15 (Europe/Amsterdam), Apollo

Card testing is one of the largest growing fraud problems within the payments landscape, with fraudsters launching millions of attempts globally each month. These attacks can cost companies thousands of euros in lost revenue and lead to the distribution of private card details. Detecting this type of fraud is extremely difficult without confirmed labels to train standard supervised ML classifiers. In this talk, we’ll describe how we built a production-ready ML model that now processes hundreds of transactions per second and share the key take-aways from our journey.

Card testing is one of the largest growing fraud problems within e-commerce payments: criminals try out slightly modified card numbers tweaking a single digit at the time, to discover valid credentials. This leads to millions of fraudulent transactions per month. Once a card is validated, it can be used to purchase expensive goods or be distributed for further use. Detecting this fraud is extremely difficult as there is no direct feedback that can be used to create labels for a ML model.

But how can we train a performant ML model without having a distinction between attacks and legitimate transactions? In this talk, we’ll describe how we tackled the labeling issue which has resulted in a production-ready ML model that now processes hundreds of transactions per second. It's been a classic cat and mouse game: as fraudsters evolve their tactics, we’ve had to stay one step ahead with clever data and modeling strategies.

We will share the following lessons and key take-aways to get other ML scientists started with similar unsupervised problems:
- Why to choose to apply supervised machine learning to what is fundamentally an unsupervised task
- Guidelines for constructing a proxy-labeled dataset, with a use-case example of using time-series and graph-based methods
- The key insights we gained from building this model - highlighting both our successes and the challenges we faced

Outline
- 0-5 min: Introduction and problem statement
- 5-8 min: Why we want to use supervised ML for an unsupervised problem
- 8-12: Proxy-label trade-offs
- 12-25: Our solution to the problem
- 25-30: Key take-aways, guidelines and lessons learned

Gotta catch ‘em all - Hunting Fraudsters with Minimal Labels and Maximum ML .ical 09-25, 14:40–15:15 (Europe/Amsterdam), Apollo

Gotta catch ‘em all - Hunting Fraudsters with Minimal Labels and Maximum ML
.ical

09-25, 14:40–15:15 (Europe/Amsterdam), Apollo