PyData Tel Aviv 2025

Integrating LLMs with Traditional Data Analysis
2025-11-05 , ML+analytics

The rise of Large Language Models has transformed data analysis, but these powerful tools aren't replacements for traditional statistical methods - they're complements. This talk explores practical strategies for building hybrid systems that leverage the strengths of both approaches. I'll demonstrate architectural patterns for effective integration, share performance comparisons, and present a real-world case study of a production sentiment analysis system that combines traditional ML for data preprocessing with LLMs for nuanced text understanding. Attendees will gain practical knowledge for enhancing their data pipelines with LLM capabilities while maintaining statistical rigor and computational efficiency.


Large Language Models have revolutionized how we approach unstructured data, but they come with significant limitations: hallucinations, lack of statistical rigor, and high computational costs. Meanwhile, traditional data analysis techniques offer statistical soundness and efficiency but struggle with unstructured data and complex feature engineering.
This talk explores how to build hybrid systems that combine the best of both worlds. I'll begin by examining the complementary strengths and weaknesses of each approach, then dive into architectural patterns for effective integration:
1- Using LLMs as sophisticated feature extractors for traditional models
2 - Employing statistical methods to validate and enhance LLM outputs
3 - Implementing ensemble approaches that create systems greater than the sum of their parts
The core of the presentation features a detailed case study of a hybrid sentiment analysis system implemented in production. This system processes social media content using traditional ML techniques for data preprocessing and validation, while leveraging LLMs for nuanced understanding of context, emotion, and intent. I'll share performance metrics comparing traditional-only, LLM-only, and hybrid approaches across dimensions of accuracy, computational efficiency, and interpretability.

Attendees will leave with practical knowledge of:
1 - Decision frameworks for choosing between traditional ML, LLMs, or hybrid approaches
2 - Design patterns for integrating these technologies effectively
3 - Implementation strategies that balance performance and resource constraints
4 - Techniques for maintaining interpretability in hybrid systems
This talk is ideal for data scientists and ML engineers who want to enhance their existing data pipelines with LLM capabilities without abandoning the statistical rigor and efficiency of traditional approaches.v


Prior Knowledge Expected:

No previous knowledge expected

Maria is a CTO and co-founder of Rokka, bringing over six years of data science expertise from Intel, EY, and Playtika.
A natural entrepreneur, she previously founded and scaled a successful chain of beer bars to ten locations in just one year.
As a CTO with extensive experience in both corporate and freelance environments, ranking 74th out of 57,000 clients on one of the biggest Freelance platform, she can share insights from successfully managing over 150 small projects as well as leading a product development team.
Today, she combines her technical leadership with a passion for education, running a practical training school for aspiring QA specialists and Data Analysts.
Her proven track record in both technology and business leadership, along with her hands-on teaching approach using real-world cases, makes her a compelling speaker and mentor in the tech community.