2025-09-26 –, Apollo
Good quality data is the basis for high quality models and valuable data insights. But isn't it annoying how often your data is riddled with those pesky humans? Human involvement in data creation often introduces errors, misunderstandings, and biases that can compromise data integrity. This talk will explore how human factors influence the data creation process and what we as data professionals can do to account for this in our data interpretation and usage.
Good quality data is the basis for good quality models and valuable data insights. But isn't it annoying how often your data is riddled with those pesky humans?
A website visitor that accidentally clicks on an ad while scrolling through a website, a call center agent whose summaries grow shorter and more vague as the workday drags on, a data annotator desperately in need of a coffee, or an officer faced with crime categories that just don’t seem to fit so they just click "other".
They all create noise in our data, that we just need to get rid of.
But is it really that simple?
This talk argues that data creation is inherently human work, and as data professionals, we need to be aware of the human factors in this in order to properly interpret our data.
We will first explore this through a case study of datafication of street-level policing within the Netherlands Police, where we'll see how human factors influence structured and unstructured reporting.
Next, we will examine how we - as data professionals - can effectively account for these human factors, both technically in data processing and modelling, as well as in the overall approach to data collection.
This talk is exceptionally well-suited for data professionals who work with 'ready-to-go' datasets that they have not been involved in creating, as well as those who have a role in designing the processes for data collection within their company.
Marysia is a Member of Technical Staff at Cohere, where she focusses on data quality for post-train data. She is also a former organising member of PyData Amsterdam, and frequently attends and speaks at conferences.
Isabelle Donatz-Fest is advisor in responsible data practices & AI at The Green Land. In 2025, she obtained her PhD at Utrecht University. In her research, she studied responsible design & use of algorithmic systems by the Netherlands Police, spending > 350 hours with the police to observe systems in their natural habitat. Isabelle gets excited about trans- and interdisciplinary research, dinosaurs and discodip, amongst other things.