5 research outputs found
Searching by approximate personal-name matching
We discuss the design, building and evaluation of a method to access theinformation of a person, using his name as a search key, even if it has deformations. We present a similarity function, the DEA function, based
on the probabilities of the edit operations accordingly to the involved
letters and their position, and using a variable threshold. The efficacy
of DEA is quantitatively evaluated, without human relevance judgments,
very superior to the efficacy of known methods. A very efficient
approximate search technique for the DEA function is also presented
based on a compacted trie-tree structure.Postprint (published version
Funciones de comparación de carácteres para APNM: la distancia DEA
A typical application of the ASM (Approximate String Matching) is the matching of personal names, as for example to search people in the DB of an Information System. Through the years, several similarity functions have been proposed:phonetic codes, simple edit distance, n-gram distances, etc.A typical application of the ASM (Approximate String Matching) is the
matching of personal names, as for example to search people in the DB of
an Information System. Through the years, several similarity functions
have been proposed: phonetic codes, simple edit distance, n-gram
distances, etc. In this report a function is presented, DEA, having
substantially better efficacy than existing ones, and mainly oriented to
spanish surnames. The DEA distance is an edit distance, with costs based
on the probabilities of the operations, characters and positions. The
distance threshold is defined as a function of the lenght of the string.
The efficacy of DEA is evaluated objectively, without human relevance
judgements.Postprint (published version