Search CORE

19,145 research outputs found

A Comparison of Blocking Methods for Record Linkage

Author: A. Goldenberg
D. Vatsalan
H. Liang
L. Paulevé
M. Kuzu
P. Christen
P. Christen
P. Christen
R. Hall
S. Fortunato
T. Herzog
Publication venue
Publication date: 01/01/2014
Field of study

Record linkage seeks to merge databases and to remove duplicates when unique identifiers are not available. Most approaches use blocking techniques to reduce the computational complexity associated with record linkage. We review traditional blocking techniques, which typically partition the records according to a set of field attributes, and consider two variants of a method known as locality sensitive hashing, sometimes referred to as "private blocking." We compare these approaches in terms of their recall, reduction ratio, and computational complexity. We evaluate these methods using different synthetic datafiles and conclude with a discussion of privacy-related issues.Comment: 22 pages, 2 tables, 7 figure

arXiv.org e-Print Archive

Crossref

Record-Linkage from a Technical Point of View

Author: Rainer Schnell
Publication venue
Publication date
Field of study

TRecord linkage is used for preparing sampling frames, deduplication of lists and combining information on the same object from two different databases. If the identifiers of the same objects in two different databases have error free unique common identifiers like personal identification numbers (PID), record linkage is a simple file merge operation. If the identifiers contains errors, record linkage is a challenging task. In many applications, the files have widely different numbers of observations, for example a few thousand records of a sample survey and a few million records of an administrative database of social security numbers. Available software, privacy issues and future research topics are discussed.Record-Linkage, Data-mining, Privacy preserving protocols

Research Papers in Economics