Search CORE

5,009 research outputs found

Assessing the disclosure protection provided by misclassification for survey microdata

Author: Shlomo Natalie
Skinner Chris
Publication venue: Southampton Statistical Sciences Reseach Institute
Publication date: 07/08/2009
Field of study

Government statistical agencies often apply statistical disclosure limitation techniques to survey microdata to protect confidentiality. There is a need for ways to assess the protection provided. This paper develops some simple methods for disclosure limitation techniques which perturb the values of categorical identifying variables. The methods are applied in numerical experiments based upon census data from the United Kingdom which are subject to two perturbation techniques: data swapping and the post randomisation method. Some simplifying approximations to the measure of risk are found to work well in capturing the impacts of these techniques. These approximations provide simple extensions of existing risk assessment methods based upon Poisson log-linear models. A numerical experiment is also undertaken to assess the impact of multivariate misclassification with an increasing number of identifying variables. The methods developed in this paper may also be used to obtain more realistic assessments of risk which take account of the kinds of measurement and other non-sampling errors commonly arising in surveys

Southampton (e-Prints Soton)

Record-Linkage from a Technical Point of View

Author: Rainer Schnell
Publication venue
Publication date
Field of study

TRecord linkage is used for preparing sampling frames, deduplication of lists and combining information on the same object from two different databases. If the identifiers of the same objects in two different databases have error free unique common identifiers like personal identification numbers (PID), record linkage is a simple file merge operation. If the identifiers contains errors, record linkage is a challenging task. In many applications, the files have widely different numbers of observations, for example a few thousand records of a sample survey and a few million records of an administrative database of social security numbers. Available software, privacy issues and future research topics are discussed.Record-Linkage, Data-mining, Privacy preserving protocols

Research Papers in Economics

Avoiding disclosure of individually identifiable health information: a literature review

Author: Borton Joshua
Fernandes-Huessy Johannes
Gonzalez Claudia
Hair Elizabeth
Holden Craig
Mulcahy Tim
Prada Sergio I
Publication venue
Publication date
Field of study

Achieving data and information dissemination without arming anyone is a central task of any entity in charge of collecting data. In this article, the authors examine the literature on data and statistical confidentiality. Rather than comparing the theoretical properties of specific methods, they emphasize the main themes that emerge from the ongoing discussion among scientists regarding how best to achieve the appropriate balance between data protection, data utility, and data dissemination. They cover the literature on de-identification and reidentification methods with emphasis on health care data. The authors also discuss the benefits and limitations for the most common access methods. Although there is abundant theoretical and empirical research, their review reveals lack of consensus on fundamental questions for empirical practice: How to assess disclosure risk, how to choose among disclosure methods, how to assess reidentification risk, and how to measure utility loss.public use files, disclosure avoidance, reidentification, de-identification, data utility

Research Papers in Economics

How Registries Can Help Performance Measurement Improve Care

Author
Publication venue: Robert Wood Johnson Foundation
Publication date: 06/06/2010
Field of study

Suggests ways to better utilize databases of clinical information to evaluate care processes and outcomes and improve measurements of healthcare quality and costs, comparative clinical effectiveness research, and medical product safety surveillance

IssueLab

Privacy and Confidentiality in an e-Commerce World: Data Mining, Data Warehousing, Matching and Disclosure Limitation

Author: Fienberg Stephen E.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2006
Field of study

The growing expanse of e-commerce and the widespread availability of online databases raise many fears regarding loss of privacy and many statistical challenges. Even with encryption and other nominal forms of protection for individual databases, we still need to protect against the violation of privacy through linkages across multiple databases. These issues parallel those that have arisen and received some attention in the context of homeland security. Following the events of September 11, 2001, there has been heightened attention in the United States and elsewhere to the use of multiple government and private databases for the identification of possible perpetrators of future attacks, as well as an unprecedented expansion of federal government data mining activities, many involving databases containing personal information. We present an overview of some proposals that have surfaced for the search of multiple databases which supposedly do not compromise possible pledges of confidentiality to the individuals whose data are included. We also explore their link to the related literature on privacy-preserving data mining. In particular, we focus on the matching problem across databases and the concept of ``selective revelation'' and their confidentiality implications.Comment: Published at http://dx.doi.org/10.1214/088342306000000240 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Optimal assignment problem on record linkage

Author: Rodriguez Fernandez Pablo
Publication venue: Universitat Politècnica de Catalunya
Publication date: 04/06/2013
Field of study

We present an application of the Hungarian Method, an optimal assignment graph theory algorithm, to record linkage in order to improve the disclosure risk assessment. We should note that Hungarian Method has O(n^3) complexity; three different methods are presented to reduce its computational cost

UPCommons. Portal del coneixement obert de la UPC

Privacy, confidentiality and practicalities in data linkage. National Statistical Quality Review

Author: David Ford
Kerina Jones
Publication venue: Government Statistical Service
Publication date: 01/01/2018
Field of study

Cronfa at Swansea University