2025-12-09 –, Analytics, Visualization & Decision Science
Ensuring and communicating data quality (DQ) is one of the most persistent challenges in data-driven organizations. Data scientists, engineers, and analysts often struggle not just with detecting DQ issues, but with presenting those issues in actionable ways for diverse stakeholders across an organization (e.g., pipeline owners, fellow developers, less-technical colleagues, etc). On top of this, DQ work has an image problem as it can be seen as tedious, opaque, or even adversarial.
This talk introduces Pointblank, a Python package designed to make data quality validation and communication both robust and approachable. The library provides a comprehensive set of tools for profiling, validating, and reporting on data quality. There’s a strong focus on beautiful and actionable outputs as well. It can help you to generate tabular validation reports, data summaries, and granular error reporting that make it easy for anyone (technical or not) to understand what’s wrong and why.
Attendees will learn how Pointblank can help their teams not only catch data issues early, but also communicate them effectively, fostering a culture of shared responsibility for data quality. The talk will include live demos of common DQ workflows, showing how Pointblank turns a traditionally painful process into something transparent, productive, and even a little bit fun.
The overall goal of this talk is to get people excited about DQ and show how the Pointblank library makes DQ validation and communication easier, clearer, and more collaborative. I’ll demonstrate some practical workflows that will hopefully inspire attendees to treat DQ as a shared (yet approachable) responsibility.
Here’s an outline for this talk:
The Data Quality Communication Problem
- why DQ is hard: technical, social, and organizational barriers
- the “last mile” problem: not just finding issues, but making them clear and actionable
- the validation plan, execution, and report lifecycleIntroducing Pointblank
- overview of the package and its philosophy: affordances for humans, not just machines
- key features: validation, profiling, reporting, and workflow supportMaking Data Quality Actionable
- live demo: Python API for data profiling, validation, and missing value reports
- nice-looking outputs: tabular report, step-by-step summaries, and crystal-clear DQ messaging
how these outputs can help people get to the root of DQ problems fasterFlexible Workflows
- using LLMs to draft a validation plan
- creating a validation plan from YAML
- integrating with CI/CD and data pipelinesDesigning this Library for Collaboration and Fun
- small design choices can make a big difference: easy-to-understand summaries, actionable extracts, and a user-friendly CLI
- my personal goal: make DQ work less annoying and more rewarding
I imagine the intended audience as being composed of data engineers, scientists, analysts, and anyone responsible for data quality. Also, this talk might interest team leads and managers looking to improve DQ culture in their organization. Insofar as skill level, this talk is suitable for Python users at any level.
No
Rich is a software engineer that enjoys working with Python. He likes to create Python libraries that help people to accomplish things. While Rich very clearly digs programming, he enjoys other things as well! Examples include: playing and listening to music, reading books, watching films, meeting up with friends, and wandering through the many valleys and ravines of the Greater Toronto Area.