PyData London 2025

Not Another LLM Talk… Practical Lessons from Building a Real-World Adverse Media Pipeline
06-07, 15:30–16:15 (Europe/London), Grand Hall

LLMs are magical—until they aren’t. Extracting adverse media entities might sound straightforward, but throw in hallucinations, inconsistent outputs, and skyrocketing API costs, and suddenly, that sleek prototype turns into a production nightmare.

Our adverse media pipeline monitors over 1 million articles a day, sifting through vast amounts of news to identify reports of crimes linked to financial bad actors, money laundering, and other risks. Thanks to GenAI and LLMs, we can tackle this problem in new ways—but deploying these models at scale comes with its own set of challenges: ensuring accuracy, controlling costs, and staying compliant in highly regulated industries.

In this talk, we’ll take you inside our journey to production, exploring the real-world challenges we faced through the lens of key personas: Cautious Claire, the compliance officer who doesn’t trust black-box AI; Magic Mike, the sales lead who thinks LLMs can do anything; Just-Fine-Tune Jenny, the PM convinced fine-tuning will solve everything; Reinventing Ryan, the engineer reinventing the wheel; and Paranoid Pete, the security lead fearing data leaks.

Expect practical insights, cautionary tales, and real-world lessons on making LLMs reliable, scalable, and production-ready. If you've ever wondered why your pipeline works perfectly in a Jupyter notebook but falls apart in production, this talk is for you.


We’ve all seen the hype—LLMs are transforming workflows, revolutionising automation, and changing how we extract insights from text. But when it comes to real-world production systems, things get messy fast.

Our adverse media pipeline processes over 1 million news articles a day, scanning for reports of crimes linked to financial bad actors, money laundering, and other regulatory risks. With GenAI and LLMs, we have powerful new tools to automate entity extraction and risk detection. However, deploying these models at scale brings a whole new set of challenges:

🛠️ Breaking Down the Problem: Why structuring tasks into modular prompts and chaining responses is key to accuracy.
💰 Cost vs. Performance Trade-offs: How different prompting strategies and model choices (API-based vs. fine-tuned local models) impact cost and scalability.
🧐 Validation & Governance: From handling hallucinations to dealing with sensitive data while staying within regulatory frameworks.
🧰 Open Source & Practical Tooling: How to build reliable, cost-efficient LLM pipelines using tools in the Python ecosystem

To illustrate the real-world challenges of getting an LLM pipeline into production, we’ll introduce a cast of personas that will feel all too familiar:

  • Cautious Claire – the compliance officer who doesn’t trust AI black boxes.
  • Magic Mike – the sales lead who thinks LLMs can do anything.
  • Just-Fine-Tune Jenny – the product manager convinced fine-tuning will fix everything.
  • Reinventing Ryan – the engineer determined to build everything from scratch.
  • Paranoid Pete – the security lead who fears LLMs will leak all the secrets.

Through their perspectives, we’ll explore the tensions, trade-offs, and hard-won lessons of taking an LLM-powered pipeline from a Jupyter notebook to a production-grade system. Expect practical insights through a real-world case study, and cautionary tales to help you navigate your own deployment challenges.

Who Should Attend?
This talk is for ML engineers, data scientists, software engineers, and product managers working with LLMs in production or planning to do so. Whether you’re evaluating architectures, struggling with cost control, or trying to balance compliance concerns, you’ll walk away with battle-tested strategies for building scalable, reliable, and regulation-friendly LLM pipelines.


Prior Knowledge Expected

No previous knowledge expected

Adam is the Interim Director of Data Science at ComplyAdvantage, where he leads a brilliant team tackling financial crime with advanced analytics, large-scale systems, and the latest in generative and agentic AI.

Before that, he spent eight years in the smart cities space at HAL24K, helping governments and infrastructure providers make better decisions with their data. Along the way, he built and led a team of ten data scientists, and helped launch four spin-out ventures—proving that good data science can move the dial in the real world.

A recovering astrophysicist, Adam spent a decade analysing data from space telescopes in search of new cosmic phenomena. He’s since redirected that curiosity toward Earth-based problems.

Adam is an active member of the PyData community, the founder of PyData Southampton, and a long-time volunteer with DataKind UK, supporting charities and NGOs with pro-bono data science.