09-26, 14:10–14:45 (Europe/Amsterdam), Apollo
Ever been burned by a mysterious slowdown in your data pipeline? In this session, we'll reveal how a stealthy performance regression in the Polars DataFrame library was hunted down and squashed. Using git bisect, Bash scripting, and uv, we automated commit compilation and benchmarking across two repos to pinpoint a commit that degraded multi-file Parquet loading. This led to challenging assumptions and rethinking performance monitoring for the Python data science library Polars.
Performance regressions can sneak in and sabotage your workflow before you even notice, until it’s too late. In this talk, we’ll take you on a step-by-step journey through the rigorous debugging process that uncovered a hidden performance bug in Polars. By combining git bisect with Bash scripts and uv for automated benchmarking, we systematically isolated the offending commit that was dragging down multi-file Parquet performance. This unfiltered, pragmatic approach didn’t just fix the issue. It sparked a shift in how the Polars team monitors performance, paving the way for continuous evaluation with prebuilt binaries.
Key takeaways include:
- Systematic Debugging: Learn how to use git bisect to narrow down performance issues with surgical precision.
- Automation in Action: See how Bash scripting and uv can automate the compilation and benchmarking process, saving you time and head-scratching.
- Data-Driven Decisions: Discover practical methods to analyze benchmark results and inform your performance optimization strategies.
- Continuous Monitoring: Understand why integrating ongoing performance checks into your development workflow is not just a nice-to-have, but essential.
If you’re a Python developer or data scientist tired of vague performance monitoring and ready for a no-nonsense, forward-thinking approach, this session is for you. Expect honest insights, a healthy dose of skepticism, and actionable techniques to ensure your projects run as efficiently as they should.