PyData Global 2025

The Lifecycle of a Jupyter Environment: From Exploration to Production-Grade Pipelines
2025-12-09 , Live from PyData Boston

Most data science projects start with a simple notebook—a spark of curiosity, some exploration, and a handful of promising results. But what happens when that experiment needs to grow up and go into production?

This talk follows the story of a single machine learning exploration that matures into a full-fledged ETL pipeline. We’ll walk through the practical steps and real-world challenges that come up when moving from a Jupyter notebook to something robust enough for daily use.

We’ll cover how to:

  • Set clear objectives and document the process from the beginning
  • Break messy notebook logic into modular, reusable components
  • Choose the right tools (Papermill, nbconvert, shell scripts) based on your workflow—not just the hype
  • Track environments and dependencies to make sure your project runs tomorrow the way it did today
  • Handle data integrity, schema changes, and even evolving labels as your datasets shift over time

And as a bonus: bring your results to life with interactive visualizations using tools like PyScript, Voila, and Panel + HoloViz


  • (3 mins) Intro
    • I've been supporting various groups in their developer experience since 2020 after being a freelance Python consultant. I've worked on many many dozens of projects, unblocking users picking the right tools for the task at hand.
    • It works on my machine
    • What we're building today: ML pipeline with RAPIDS -> Snowflake
    • We're going to watch a real project grow up
  • (3 mins) Exploration - starting as a single messy notebook, sample data set.
    • Why RAPIDS? GPU
      • Large data sets
      • GPU availability - remote machine, local GPU
      • workflows that work well with GPU
    • Load Data cuDF / pandas
    • Quick EDA and data visualization
    • Train cuML / scikit-learn model
    • no-code change philosophy
  • (7 mins) Make it repeatable - Start with simple tried and true tools, explore where tools like Papermill help with flexibilty and reproducibility
    • common painpoints: operating cadence, specialized scenarios, manual execution is error prone
    • shell scripts versus papermill
    • reproducible environments
    • generate HTML reports
    • pass through parameters in your notebook
  • (8 mins) Make it reliable - Modular code & testing
    • common painpoints: data schema changes, debugging issues, testing & modularity
    • nbconvert + Python: turn your notebook into a script
    • turn a function into a module
    • dashboard with HoloViz / Panel, discuss choosing tools like Voila and PyScript
  • (5 mins) Snowflake integration
    • common painpoints: data volume, coordinate with other data systems, audits
    • picking the right tools: cost complexity tradeoff
    • RAPIDS preprocessing to Snowflake storage
    • self-service access for stakeholders
  • (3 mins) Conclusion
    • Start simple
    • Add complexity when you feel specific pain

Prior Knowledge Expected: No