69 research outputs found

    Cost Optimal Record/Entity Matching

    Get PDF

    A generalized cost optimal decision model for record matching

    No full text
    Record (or entity) matching or linkage is the process of identifying records in one or more data sources, that refer to the same real world entity or object. In record linkage, the ultimate goal of a decision model is to provide the decision maker with a tool for making decisions upon the actual matching status of a pair of records (i.e., documents, events, persons, cases, etc.). Existing models of record linkage rely on decision rules that minimize the probability of subjecting a case to clerical review, conditional on the probabilities of erroneous matches and erroneous non-matches. In practice though, (a) the value of an erroneous match is, in many applications, quite different from the value of an erroneous non-match, and (b) the cost and the probability of a misclassification, which is associated with the clerical review, is ignored in this way. In this paper, we present a decision model which is optimal, based on the cost of the record linkage operation, and general enough to accommodate multi-class or multi-decision case studies. We also present an example along with the results from applying the proposed model to large comparison spaces. ©2004 ACM

    Record Matching: Past, Present and Future

    Get PDF

    A max-min approach for hiding frequent itemsets

    No full text
    In this paper we are proposing a new algorithmic approach for sanitizing raw data from sensitive knowledge in the context of mining of association rules. The new approach (a) relies on the maxmin criterion which is a method in decision theory for maximizing the minimum gain and (b) builds upon the border theory of frequent itemsets. © 2006 IEEE

    Reference table based k-anonymous private blocking

    No full text
    Privacy Preserving Record Linkage is an emerging field of research which attempts to deal with the classical linkage problem from a privacy preserving point of view. In this paper we propose a novel approach for performing Privacy Preserving Blocking in order to minimize the computational cost of Privacy Preserving Record Linkage. We achieve this without compromising privacy by using Nearest Neighbors clustering, a well-known clustering algorithm and by using a reference table. A reference table is a publicly known table the contents of which are used as intermediate references. The combination of Nearest Neighbors and a reference table offers our approach k-anonymity characteristics. © 2012 ACM

    Privacy preserving record linkage using phonetic codes

    No full text
    Phonetic codes such as Soundex and Metaphone have been used in the past to address the Record Linkage Problem. However, to the best of our knowledge, no particular effort has been made within this context towards privacy assurance during the matching process. Phonetic codes have an interesting feature which can be cornerstone to providing privacy. They are mappings of strings which do not exhibit the one-to-one property. In this paper, we present a novel protocol for achieving privacy preserving record linkage using phonetics, we provide proof of correctness for our approach and finally we illustrate experimental results concerning performance and matching accuracy. The proposed protocol can be equally well applied to codes different than the phonetic ones, which do not exhibit the one-to-one property, such as hash tables with comparable results. © 2009 IEEE

    EXACT KNOWLEDGE HIDING IN TRANSACTIONAL DATABASES

    No full text
    The hiding of sensitive knowledge in the form of frequent itemsets, has gained increasing attention over the past years. This paper highlights the process of border revision, which is essential for the identification of hiding solutions bearing no side-effects, and provides efficient algorithms for the computation of the revised positive and the revised negative borders. By utilizing border revision, we unify the theory behind two exact hiding algorithms that guarantee optimal solutions both in terms of database distortion and side-effects introduced by the hiding process. Following that, we propose a novel extension to one of the hiding algorithms that allows it to identify exact hiding solutions to a much wider range of problems (than its original counterpart). Through experimentation, we compare the exact hiding schemes against two state-of-the-art heuristic algorithms and demonstrate their ability to consistently provide solutions of higher quality to a wide variety of hiding problems
    corecore