Skip to main content

A Novel Approach to Address Validation using String Distance Metrics

Page 1

International Research Journal of Engineering and Technology (IRJET)

e-ISSN: 2395-0056

Volume: 10 Issue: 05 | May 2023

p-ISSN: 2395-0072

www.irjet.net

A Novel Approach to Address Validation using String Distance Metrics Sahil Kale1 Computer Engineering Pune Institute of Computer Technology Pune, Maharashtra, India ---------------------------------------------------------------------***---------------------------------------------------------------------

Abstract - Address validation holds fundamental value in

The main contribution made by this paper includes proposing a system that resolves the issue of address validation by developing a novel algorithm that requires no pre-processing of the input addresses. It uses statistical correlation measures as weights to combine different stringmatching metrics and generate a normalized matching score. This matching score is then utilized to filter out the validated addresses and store them for further use.

confirming the accuracy and geographical precision of addresses used by location-dependent and delivery-based organizations. Addresses often suffer from problems such as missing components and geographical inadequacies which can cause grave logistical issues if not validated adequately beforehand. The identification of missing or invalid address components to perform address validation can prove to be a helpful factor in saving time and cost for businesses and simultaneously reducing the chances of errors in service. Significant potential has been found in the usage of statistical measures such as correlation coefficients and measures of central tendency to perform the task of address validation. The system proposed in this paper uses a combination of different string-matching metrics to generate a normalized score based on statistical similarities. This score can then be used to filter out validated addresses according to the threshold of similarity required. Experiments have been conducted on a real-world healthcare dataset to demonstrate the effectiveness of the proposed approach in terms of accuracy and precision.

Fig -1: Problems in Address Validation

Key Words: Address Matching, Address Validation, Geocoding, Natural Language Processing, String Matching

2.LITERATURE REVIEW Address validation has historically been dealt with as a pure NLP (Natural Language Processing) task involving sequence labeling. Hidden Markov Models (HMMs) [4] and Conditional Random Field models [5] have been used as a deep learning approach to address validation, but suffer in performance when given inputs of non-standardized addresses. In [6], the authors proposed a method using the BERT language model which can help in contextual modeling of text data. However, none of the existing systems are able to deal with address validation without some preprocessing before moving to the task of validation, which may cause a compromise in precision, a problem that the proposed system aims to solve.

1.INTRODUCTION Addresses are fundamental in pinpointing a geographical location on earth. By improving the accuracy of addresses, considerable savings can be achieved in terms of time and money for organizations that rely on the precision of autogenerated addresses to maximize customer satisfaction [1]. Automatic address generation is often done using the reverse geo-coding process, which converts geographical coordinates into textual addresses [2]. Even though the validation of these addresses, i.e., matching them with correct and verified addresses, seems to be a straightforward task, a lot of complications exist while trying to perform the same.

In order to build an efficient architecture for address validation, the system proposed in this paper takes the support of commonly-used string matching metrics to perform the task of address validation. String-matching metrics can be broadly classified into three major types, (i) Edit distance-based metrics, (ii) Token-based metrics and (iii) Hybrid metrics, a detailed explanation of which can be found in [7]. On the basis of the comparisons carried out by [7] and [8], a set of six best performing metrics was chosen to design the methodology followed in this paper.

Address validation, the process of verifying the accuracy and precision of an auto-generated address by matching it with a true counterpart, suffers from several problems when faced with unstructured addresses which may have missing attributes or geographic inconsistencies [3]. The problem caused by geographical inconsistencies can be seen in fig.1, which represents how missing elements in the generated address can be difficult to verify by direct comparison.

© 2023, IRJET

|

Impact Factor value: 8.226

|

ISO 9001:2008 Certified Journal

|

Page 258


Turn static files into dynamic content formats.

Create a flipbook
A Novel Approach to Address Validation using String Distance Metrics by IRJET Journal - Issuu