Thibault Dody
Thibault Dody is a Senior Data Scientist at Faraday, specializing in scalable machine learning architectures for consumer behavior prediction. His previous work focused on developing methods to detect harmful online and social media content. He earned his Master’s degree in Computational Science from MIT.
Session
Messy and inconsistent data is the curse of any analytic or modeling workflow. This talk uses the example of working with address data and demonstrates how natural-language-based approaches can be applied to clean and normalize addresses at scale. The presentation will showcase the results of several methods, ranging from naive regular expression rules to 3rd-party APIs, open-source address parsing, scalable LLM embeddings with vector search, and custom text embeddings.
Attendees will leave knowing when to choose each method and how to balance cost, speed, and precision.