5 research outputs found

    Searching by approximate personal-name matching

    Get PDF
    We discuss the design, building and evaluation of a method to access theinformation of a person, using his name as a search key, even if it has deformations. We present a similarity function, the DEA function, based on the probabilities of the edit operations accordingly to the involved letters and their position, and using a variable threshold. The efficacy of DEA is quantitatively evaluated, without human relevance judgments, very superior to the efficacy of known methods. A very efficient approximate search technique for the DEA function is also presented based on a compacted trie-tree structure.Postprint (published version

    Funciones de comparación de carácteres para APNM: la distancia DEA

    Get PDF
    A typical application of the ASM (Approximate String Matching) is the matching of personal names, as for example to search people in the DB of an Information System. Through the years, several similarity functions have been proposed:phonetic codes, simple edit distance, n-gram distances, etc.A typical application of the ASM (Approximate String Matching) is the matching of personal names, as for example to search people in the DB of an Information System. Through the years, several similarity functions have been proposed: phonetic codes, simple edit distance, n-gram distances, etc. In this report a function is presented, DEA, having substantially better efficacy than existing ones, and mainly oriented to spanish surnames. The DEA distance is an edit distance, with costs based on the probabilities of the operations, characters and positions. The distance threshold is defined as a function of the lenght of the string. The efficacy of DEA is evaluated objectively, without human relevance judgements.Postprint (published version
    corecore