PyData Tel Aviv 2025

Do You Want to Build a Snowman? Leveraging DBT, SQL and Python to Build Production Data Science Pipelines in Snowflake
2025-11-05 , Green
Language: English

As data scientists and analysts working in a data-led product team, we often found ourselves struggling to move our carefully crafted heuristics, insightful features, and promising machine learning models from initial experimentation into reliable, production-ready systems.

In this talk, I’ll share how we tackled this by building a scalable solution leveraging DBT and Snowflake.
This talk is meant for all data-oriented professions, and while a background in building data pipelines is helpful, it is not required to understand the talk.


The journey from a data science prototype to a productionized system can be fraught with challenges, including version control, reliable deployment, and integrating complex logic. This talk addresses these critical pain points by demonstrating how DBT, Python and Snowflake provide a cohesive framework for building reliable and maintainable data pipelines.

This solution gives us the best of both worlds:

  1. Allowed us to both easily define complex machine learning pipelines by harnessing Python's power directly within Snowflake
  2. Empowered our less technical analysts to contribute significantly by defining SQL-based heuristics
    All while providing a reliable way to implement programming best practices for our data assets.

In this session, I’ll show a real-world case study, showcasing how we used these technologies in combination with commonly used tools (like Github CI/CD and Airflow) to overcome the struggles we faced along the way and convert our legacy scripts and notebooks into a fully functional ML system.
This talk is meant for all data professions - data scientists, data engineers, and analytics managers who are looking for practical solutions to operationalize data science outputs.

Attendees will leave with a clear understanding of the architectural patterns and practical steps required to implement a similar solution within their organizations, transforming their data science outputs into reliable, production-grade assets.

This talk will explore how this solution enables:

  1. Robust Programming Best Practices for Data: Learn how DBT brings software engineering principles like version control (Git), CI/CD, modularity, and comprehensive testing to data transformations, ensuring data quality and reproducibility.

  2. Unleashing Python in Snowflake & Smart Orchestration: Discover how Snowflake's support for Docker and Python allows for the direct execution of sophisticated data science logic—from custom feature engineering to model inference—within the data warehouse, all orchestrated by popular tools like Airflow.

  3. Empowering SQL-Savvy Analysts: Understand how this setup empowers analysts with less technical programming knowledge to contribute significantly by writing SQL-based heuristics, fostering a more inclusive and efficient data team where diverse skill sets can thrive.


Prior Knowledge Expected:

No previous knowledge expected

I am a data scientist with a strong background in software engineering and system development, currently helping protect Web3 as a DS in Blockaid.

I'm passionate for solving real-world problems through machine learning, big data technologies, and algorithm development, and have a proven track record of taking data-oriented projects from ideation to production in various fields.