69 research outputs found
A generalized cost optimal decision model for record matching
Record (or entity) matching or linkage is the process of identifying records in one or more data sources, that refer to the same real world entity or object. In record linkage, the ultimate goal of a decision model is to provide the decision maker with a tool for making decisions upon the actual matching status of a pair of records (i.e., documents, events, persons, cases, etc.). Existing models of record linkage rely on decision rules that minimize the probability of subjecting a case to clerical review, conditional on the probabilities of erroneous matches and erroneous non-matches. In practice though, (a) the value of an erroneous match is, in many applications, quite different from the value of an erroneous non-match, and (b) the cost and the probability of a misclassification, which is associated with the clerical review, is ignored in this way. In this paper, we present a decision model which is optimal, based on the cost of the record linkage operation, and general enough to accommodate multi-class or multi-decision case studies. We also present an example along with the results from applying the proposed model to large comparison spaces. ©2004 ACM
A max-min approach for hiding frequent itemsets
In this paper we are proposing a new algorithmic approach for sanitizing raw data from sensitive knowledge in the context of mining of association rules. The new approach (a) relies on the maxmin criterion which is a method in decision theory for maximizing the minimum gain and (b) builds upon the border theory of frequent itemsets. © 2006 IEEE
Reference table based k-anonymous private blocking
Privacy Preserving Record Linkage is an emerging field of research which attempts to deal with the classical linkage problem from a privacy preserving point of view. In this paper we propose a novel approach for performing Privacy Preserving Blocking in order to minimize the computational cost of Privacy Preserving Record Linkage. We achieve this without compromising privacy by using Nearest Neighbors clustering, a well-known clustering algorithm and by using a reference table. A reference table is a publicly known table the contents of which are used as intermediate references. The combination of Nearest Neighbors and a reference table offers our approach k-anonymity characteristics. © 2012 ACM
Privacy preserving record linkage using phonetic codes
Phonetic codes such as Soundex and Metaphone have been used in the past to address the Record Linkage Problem. However, to the best of our knowledge, no particular effort has been made within this context towards privacy assurance during the matching process. Phonetic codes have an interesting feature which can be cornerstone to providing privacy. They are mappings of strings which do not exhibit the one-to-one property. In this paper, we present a novel protocol for achieving privacy preserving record linkage using phonetics, we provide proof of correctness for our approach and finally we illustrate experimental results concerning performance and matching accuracy. The proposed protocol can be equally well applied to codes different than the phonetic ones, which do not exhibit the one-to-one property, such as hash tables with comparable results. © 2009 IEEE
EXACT KNOWLEDGE HIDING IN TRANSACTIONAL DATABASES
The hiding of sensitive knowledge in the form of frequent itemsets, has gained increasing attention over the past years. This paper highlights the process of border revision, which is essential for the identification of hiding solutions bearing no side-effects, and provides efficient algorithms for the computation of the revised positive and the revised negative borders. By utilizing border revision, we unify the theory behind two exact hiding algorithms that guarantee optimal solutions both in terms of database distortion and side-effects introduced by the hiding process. Following that, we propose a novel extension to one of the hiding algorithms that allows it to identify exact hiding solutions to a much wider range of problems (than its original counterpart). Through experimentation, we compare the exact hiding schemes against two state-of-the-art heuristic algorithms and demonstrate their ability to consistently provide solutions of higher quality to a wide variety of hiding problems
- …