Deterministic Address Matching Using Fuzzy Regular Expressions
Address matching is the most critical step within the geocoding process, as it provides the link between tabular address data and its corresponding geographic information. However, address matching can be a difficult task, as address datasets often contain errors that existing geocoding applications do not accommodate. Such issues can hamper the validity of spatial analysis, as an excess number of false negative matches can result in a geographic selection bias while an excess number of false positive matches can result in a geographic placement bias. Given the frequency of input address errors, and the relatively low tolerance for error that most existing geocoding software allow, this project introduced a new address matching system (AMS) that managed to obtain a higher number of correct match assessments compared to existing CASS certified vendors, in addition to the ArcGIS Address Locator. The process described incorporates rule-based standardization and deterministic matching logic using Python's fuzzy regular expressions module.