Enhancing Fraud Detection with LLM-Generated Profiles: From Analyst Efficiency to Model Performance PyData London 2025

Enhancing Fraud Detection with LLM-Generated Profiles: From Analyst Efficiency to Model Performance
.ical
2025-06-07 11:50–12:35, Doddington Forum

This talk explores how leveraging Large Language Models (LLMs) to generate structured customer profile summaries improved both compliance analyst workflows and fraud scoring models at a financial institution. Attendees will learn how embeddings derived from LLM-generated narratives outperformed traditional manual feature engineering and raw text embeddings, offering insights into practical applications of NLP in fraud detection.

Objective:

Fraud detection systems often rely on manually crafted features or text embeddings of unstructured texts, which may miss nuanced patterns in unstructured data. This talk presents a case study where LLM-generated customer profiles—summarising transaction history, documents, interaction history and related profiles—were used to (1) accelerate compliance reviews and (2) extract embeddings that boosted fraud model performance and sped up its development.

Outline:
* 0-10 mins: Introduction to challenges in fraud detection: manual inefficiencies and limitations of traditional feature engineering.
* 10-20 mins: Methodology: Designing LLM-generated profiles to unify structured/unstructured data, and embedding extraction.
* 20-30 mins: Results: How embeddings of the LLM-generated summaries captured contextual relationships (e.g., subtle transaction-document inconsistencies) better than raw text embeddings or manual features, lessons learned, scalability considerations

Key Takeaways:
* LLMs can transform unstructured data into actionable insights for both human analysts and ML models.
Embeddings from LLM-generated summaries may outperform naive text embeddings by capturing synthesized context and reducing noise.
* Practical strategies to integrate LLMs into existing fraud detection pipelines without disrupting workflows.

Why It Matters:
This approach bridges the gap between unstructured data utilization and interpretable model improvements, offering a scalable approach for institutions implementing LLM-based solutions.

Background Knowledge:
Basic understanding of NLP (e.g., embeddings) and supervised learning. No advanced LLM expertise is required.

Audience:
Data scientists, ML engineers, and fraud analysts familiar with basic NLP/ML concepts. Ideal for those exploring NLP applications in finance or seeking alternatives to manual feature engineering.

Prior Knowledge Expected: Previous knowledge expected

Radion Bikmukhamedov

Radion Bikmukhamedov is a Machine Learning Engineer in ANNA Money's Financial Crime Prevention unit, specializing in operationalizing fraud detection systems that safeguard millions of monthly transactions and save thousands of hours of manual labour by automating fraud analysts's tasks. Over 6 years, he's architected NLP and ensemble model pipelines using Python's ML stack paired with MLOps tools (MLflow, DVC, KServe, Feast) to automate financial crime detection at scale.

Enhancing Fraud Detection with LLM-Generated Profiles: From Analyst Efficiency to Model Performance .ical 2025-06-07 11:50–12:35, Doddington Forum

Enhancing Fraud Detection with LLM-Generated Profiles: From Analyst Efficiency to Model Performance
.ical
2025-06-07 11:50–12:35, Doddington Forum