PyData Seattle 2025

The Problem of Address Matching: a Journey through NLP and AI
2025-11-09 , Tutorial Track 3

The problem of address matching arrives when the address of one physical place is written in two or more different ways. This situation is very common in companies that receive records of customers from different sources. The differences can be classified as syntactic and semantic. In the first type, the meaning is the same but the way they are written is different. For example, one can find "Street" vs "St". In the second type, the meaning is not exactly the same. For example, one can find "Road" instead of "Street". To solve this problem and match addresses, we have a couple of approaches. The first and simple is by using similarity metrics. The second uses natural language and transformers. This is a hands-on talk and is intended for data process analyst. We are going to go through these solutions implemented in a Jupyter notebook using Python.


  • The objective: to present different solution approaches including the AI of the problem of address matching. This will be done in a Jupyter notebook using Python.

  • The outline of the talk:

  1. Introduction
    Address matching is a problem that arrives when two different sources have different address for one physical place. This is a common problem in companies that deal with many customers. We classify the problems in two: syntactic and semantic.

  2. Issues
    Different issues are found such as writing "Street" and "St" or "St." and other variations. Another issue appears when one has "I89" (for interstate 89) vs "US 89". Other issues are also presented.

  3. Address Normalization
    This is a technique to help us with homogenize different address. We download the list of standard US abbreviations from the US post office webpage and use it to standardize the addresses. In the previous example, we would transform "Street" into "ST" (after applying uppercase, also) to make it match with "St".

  4. Metrics of Similarity
    4.1 Definition of a metric
    Metrics quantify the distance from one object to another. In our case, it will help us know how far one word is from another word. Metrics such as the following are presented:
    4.2 Levenshtein Distance
    4.3 Jaro-Winkler Similarity
    4.4 Cosine Similarity

  5. Natural Language and transformers
    Concepts of NLP and Transformers are presented to provide a modern approach to the problem.
    5.1 Natural Language
    5.2 Transformers
    5.3 Word Embedding
    5.4 Contextual Embedding

  6. Conclusion

  • Central Thesis: The purpose is to show different approaches to the problem of address matching.
  • Key takeaways: There are tools in Natural Language Processing and Artificial Intelligence to tackle the problem of address matching.
  • Background knowledge expected: basic math and a basic notion of deep learning architecture.

Prior Knowledge Expected:

No previous knowledge expected

Ivan Perez Avellaneda is a researcher specializing in nonlinear systems, currently serving as a business analyst at Monaghan Medical Corporation. During his doctoral studies, he focused on data-driven reachability of nonlinear systems, a field with wide-ranging applications across various scientific and engineering domains, including economics. He brings extensive experience co-teaching in higher education of numerous mathematics-related courses.

He holds a Ph.D. in electrical engineering (2023) from the University of Vermont in the US, an M.Sc in economics (2018), and a B.Sc in mathematics (2016), both obtained from the Pontifical Catholic University of Peru. Alongside his academic pursuits, he has working experience in the education, financial, and business sectors, where he leveraged his skills in data science.

In academia, his interests are vast, but currently, his focus is on specific branches of optimization such as nonconvex, constrained, calculus of variation, optimal control, and the applications of these.