The Lifecycle of a Jupyter Environment: From Exploration to Production-Grade Pipelines PyData Global 2025

The Lifecycle of a Jupyter Environment: From Exploration to Production-Grade Pipelines
.ical
2025-12-09 15:00–15:40, Live from PyData Boston

Most data science projects start with a simple notebook—a spark of curiosity, some exploration, and a handful of promising results. But what happens when that experiment needs to grow up and go into production?

This talk follows the story of a single machine learning exploration that matures into a full-fledged ETL pipeline. We’ll walk through the practical steps and real-world challenges that come up when moving from a Jupyter notebook to something robust enough for daily use.

We’ll cover how to:

Set clear objectives and document the process from the beginning
Break messy notebook logic into modular, reusable components
Choose the right tools (Papermill, nbconvert, shell scripts) based on your workflow—not just the hype
Track environments and dependencies to make sure your project runs tomorrow the way it did today
Handle data integrity, schema changes, and even evolving labels as your datasets shift over time

And as a bonus: bring your results to life with interactive visualizations using tools like PyScript, Voila, and Panel + HoloViz

(3 mins) Intro
- I've been supporting various groups in their developer experience since 2020 after being a freelance Python consultant. I've worked on many many dozens of projects, unblocking users picking the right tools for the task at hand.
- It works on my machine
- What we're building today: ML pipeline with RAPIDS -> Snowflake
- We're going to watch a real project grow up
(3 mins) Exploration - starting as a single messy notebook, sample data set.
- Why RAPIDS? GPU
  - Large data sets
  - GPU availability - remote machine, local GPU
  - workflows that work well with GPU
- Load Data cuDF / pandas
- Quick EDA and data visualization
- Train cuML / scikit-learn model
- no-code change philosophy
(7 mins) Make it repeatable - Start with simple tried and true tools, explore where tools like Papermill help with flexibilty and reproducibility
- common painpoints: operating cadence, specialized scenarios, manual execution is error prone
- shell scripts versus papermill
- reproducible environments
- generate HTML reports
- pass through parameters in your notebook
(8 mins) Make it reliable - Modular code & testing
- common painpoints: data schema changes, debugging issues, testing & modularity
- nbconvert + Python: turn your notebook into a script
- turn a function into a module
- dashboard with HoloViz / Panel, discuss choosing tools like Voila and PyScript
(5 mins) Snowflake integration
- common painpoints: data volume, coordinate with other data systems, audits
- picking the right tools: cost complexity tradeoff
- RAPIDS preprocessing to Snowflake storage
- self-service access for stakeholders
(3 mins) Conclusion
- Start simple
- Add complexity when you feel specific pain

Prior Knowledge Expected: No

Dawn Wages

The Lifecycle of a Jupyter Environment: From Exploration to Production-Grade Pipelines .ical 2025-12-09 15:00–15:40, Live from PyData Boston

The Lifecycle of a Jupyter Environment: From Exploration to Production-Grade Pipelines
.ical
2025-12-09 15:00–15:40, Live from PyData Boston