PyData Seattle 2025

The Problem of Address Matching: a Journey through NLP and AI
2025-11-09 , Room 118

The problem of address matching arrives when the address of one physical place is written in two or more different ways. This situation is very common in companies that receive records of customers from different sources. The differences can be classified as syntactic and semantic. In the first type, the meaning is the same but the way they are written is different. For example, one can find "Street" vs "St". In the second type, the meaning is not exactly the same. For example, one can find "Road" instead of "Street". To solve this problem and match addresses, we have a couple of approaches. The first and simple is by using similarity metrics. The second uses natural language and transformers. This is a hands-on talk and is intended for data process analyst. We are going to go through these solutions implemented in a Jupyter notebook using Python.


  • The objective: to present different solution approaches including the AI of the problem of address matching. This will be done in a Jupyter notebook using Python.

  • The outline of the talk:

  1. Introduction
    Address matching is a problem that arrives when two different sources have different address for one physical place. This is a common problem in companies that deal with many customers. We classify the problems in two: syntactic and semantic.

  2. Issues
    Different issues are found such as writing "Street" and "St" or "St." and other variations. Another issue appears when one has "I89" (for interstate 89) vs "US 89". Other issues are also presented.

  3. Address Normalization
    This is a technique to help us with homogenize different address. We download the list of standard US abbreviations from the US post office webpage and use it to standardize the addresses. In the previous example, we would transform "Street" into "ST" (after applying uppercase, also) to make it match with "St".

  4. Metrics of Similarity
    4.1 Definition of a metric
    Metrics quantify the distance from one object to another. In our case, it will help us know how far one word is from another word. Metrics such as the following are presented:
    4.2 Levenshtein Distance
    4.3 Jaro-Winkler Similarity
    4.4 Cosine Similarity

  5. Natural Language and transformers
    Concepts of NLP and Transformers are presented to provide a modern approach to the problem.
    5.1 Natural Language
    5.2 Transformers
    5.3 Word Embedding
    5.4 Contextual Embedding

  6. Conclusion

  • Central Thesis: The purpose is to show different approaches to the problem of address matching.
  • Key takeaways: There are tools in Natural Language Processing and Artificial Intelligence to tackle the problem of address matching.
  • Background knowledge expected: basic math and a basic notion of deep learning architecture.

Prior Knowledge Expected: No previous knowledge expected

I’m a data-driven problem solver with a Ph.D. in Electrical Engineering and a strong foundation in mathematics, economics, and nonlinear systems. Currently working as an Analytics Engineer at Monaghan Medical Corporation, I apply advanced analytics and modeling techniques to improve operational efficiency and strategic decision-making.
My doctoral research focused on data-driven reachability analysis of nonlinear systems, bridging control theory, optimization, and AI safety. I’m passionate about translating complex mathematical frameworks into scalable, intelligent solutions—whether in business, finance, or engineering domains.
With experience across academia, healthcare, and financial sectors, I’ve applied tools from machine learning, operations research, and statistical modeling to solve real-world challenges. I’ve also co-taught university-level courses in mathematics and control systems, reinforcing my commitment to clear communication and technical leadership.
Main Interests:
- Nonconvex and constrained optimization
- Optimal control and calculus of variations
- Machine learning, AI interpretability, natural language processing (NLP)
- Time-series analysis and predictive modeling
- Symbolic computation and formal methods
Education:
- Ph.D. in Electrical Engineering – University of Vermont (2023)
- M.Sc. in Economics – Pontifical Catholic University of Peru (2018)
- B.Sc. in Mathematics – Pontifical Catholic University of Peru (2016)