28 research outputs found

    Some methods for blindfolded record linkage

    Get PDF
    BACKGROUND: The linkage of records which refer to the same entity in separate data collections is a common requirement in public health and biomedical research. Traditionally, record linkage techniques have required that all the identifying data in which links are sought be revealed to at least one party, often a third party. This necessarily invades personal privacy and requires complete trust in the intentions of that party and their ability to maintain security and confidentiality. Dusserre, Quantin, Bouzelat and colleagues have demonstrated that it is possible to use secure one-way hash transformations to carry out follow-up epidemiological studies without any party having to reveal identifying information about any of the subjects – a technique which we refer to as "blindfolded record linkage". A limitation of their method is that only exact comparisons of values are possible, although phonetic encoding of names and other strings can be used to allow for some types of typographical variation and data errors. METHODS: A method is described which permits the calculation of a general similarity measure, the n-gram score, without having to reveal the data being compared, albeit at some cost in computation and data communication. This method can be combined with public key cryptography and automatic estimation of linkage model parameters to create an overall system for blindfolded record linkage. RESULTS: The system described offers good protection against misdeeds or security failures by any one party, but remains vulnerable to collusion between or simultaneous compromise of two or more parties involved in the linkage operation. In order to reduce the likelihood of this, the use of last-minute allocation of tasks to substitutable servers is proposed. Proof-of-concept computer programmes written in the Python programming language are provided to illustrate the similarity comparison protocol. CONCLUSION: Although the protocols described in this paper are not unconditionally secure, they do suggest the feasibility, with the aid of modern cryptographic techniques and high speed communication networks, of a general purpose probabilistic record linkage system which permits record linkage studies to be carried out with negligible risk of invasion of personal privacy

    A method and a tool for geocoding and record linkage

    Get PDF
    For many years, researchers have presented the geocoding of postal addresses as a challenge. Several research works have been devoted to achieve the geocoding process. This paper presents theoretical and technical aspects for geolocalization, geocoding, and record linkage. It shows possibilities and limitations of existing methods and commercial software identifying areas for further research. In particular, we present a methodology and a computing tool allowing the correction and the geo-coding of mailing addresses. The paper presents two main steps of the methodology. The first preliminary step is addresses correction (addresses matching), while the second caries geocoding of identified addresses. Additionally, we present some results from the processing of real data sets. Finally, in the discussion, areas for further research are identified.addresses correction; geocodage; matching; data management; record linkage

    Record-Linkage from a Technical Point of View

    Get PDF
    TRecord linkage is used for preparing sampling frames, deduplication of lists and combining information on the same object from two different databases. If the identifiers of the same objects in two different databases have error free unique common identifiers like personal identification numbers (PID), record linkage is a simple file merge operation. If the identifiers contains errors, record linkage is a challenging task. In many applications, the files have widely different numbers of observations, for example a few thousand records of a sample survey and a few million records of an administrative database of social security numbers. Available software, privacy issues and future research topics are discussed.Record-Linkage, Data-mining, Privacy preserving protocols

    Building Application-Related Patient Identifiers: What Solution for a European Country?

    Get PDF
    We propose a method utilizing a derived social security number with the same reliability as the social security number. We show the anonymity techniques classically based on unidirectional hash functions (such as the secure hash algorithm (SHA-2) function that can guarantee the security, quality, and reliability of information if these techniques are applied to the Social Security Number). Hashing produces a strictly anonymous code that is always the same for a given individual, and thus enables patient data to be linked. Different solutions are developed and proposed in this article. Hashing the social security number will make it possible to link the information in the personal medical file to other national health information sources with the aim of completing or validating the personal medical record or conducting epidemiological and clinical research. This data linkage would meet the anonymous data requirements of the European directive on data protection

    Record-linkage from a technical point of view

    Full text link
    "Record linkage is used for preparing sampling frames, deduplication of lists and combining information on the same object from two different databases. If the identifiers of the same objects in two different databases have error free unique common identifiers like personal identification numbers (PID), record linkage is a simple file merge operation. If the identifiers contain errors, record linkage is a challenging task. In many applications, the files have widely different numbers of observations, for example a few thousand records of a sample survey and a few million records of an administrative database of social security numbers. Available software, privacy issues and future research topics are discussed." [author's abstract

    Privacy-preserving record linkage using Bloom filters

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Combining multiple databases with disjunctive or additional information on the same person is occurring increasingly throughout research. If unique identification numbers for these individuals are not available, probabilistic record linkage is used for the identification of matching record pairs. In many applications, identifiers have to be encrypted due to privacy concerns.</p> <p>Methods</p> <p>A new protocol for privacy-preserving record linkage with encrypted identifiers allowing for errors in identifiers has been developed. The protocol is based on Bloom filters on <it>q</it>-grams of identifiers.</p> <p>Results</p> <p>Tests on simulated and actual databases yield linkage results comparable to non-encrypted identifiers and superior to results from phonetic encodings.</p> <p>Conclusion</p> <p>We proposed a protocol for privacy-preserving record linkage with encrypted identifiers allowing for errors in identifiers. Since the protocol can be easily enhanced and has a low computational burden, the protocol might be useful for many applications requiring privacy-preserving record linkage.</p

    Record linked retrospective cohort study of 4.6 million people exploring ethnic variations in disease: myocardial infarction in South Asians

    Get PDF
    Background Law and policy in several countries require health services to demonstrate that they are promoting racial/ethnic equality. However, suitable and accurate data are usually not available. We demonstrated, using acute myocardial infarction, that linkage techniques can be ethical and potentially useful for this purpose. Methods The linkage was based on probability matching. Encryption of a unique national health identifier (the Community Health Index (CHI)) ensured that information about health status and census-based ethnicity could not be ascribed to an identified individual. We linked information on individual ethnic group from the 2001 Census to Scottish hospital discharge and mortality data. Results Overall, 94% of the 4.9 million census records were matched to a CHI record with an estimated false positive rate of less than 0.1 %, with 84.9 – 87.6% of South Asians being successfully linked. Between April 2001 and December 2003 there were 126 first episodes of acute myocardial infarction (AMI) among South Asians and 30,978 among non-South Asians. The incidence rate ratio was 1.45 (95% CI 1.17, 1.78) for South Asian compared to non-South Asian men and 1.80 (95% CI 1.31, 2.48) for South Asian women. After adjustment for age, sex and any previous admission for diabetes the hazard ratio for death following AMI was 0.59 (95% CI 0.43, 0.81), reflecting better survival among South Asians. Conclusion The technique met ethical, professional and legal concerns about the linkage of census and health data and is transferable internationally wherever the census (or population register) contains ethnic group or race data. The outcome is a retrospective cohort study. Our results point to increased incidence rather than increased case fatality in explaining high CHD mortality rate. The findings open up new methods for researchers and health planners

    Privacy and Data Balkanization: Circumventing the Barriers

    Get PDF
    The rapid growth in digital data forms the basis for a wide range of new services and research,&nbsp;e.g, large-scale medical studies. At the same time, increasingly restrictive privacy concerns and&nbsp;laws are leading to significant overhead in arranging for sharing or combining different data sets&nbsp;to obtain these benefits. For new applications, where the benefit of combined data is not yet clear,&nbsp;this overhead can inhibit organizations from even trying to determine whether they can mutually&nbsp;benefit from sharing their data. In this paper, we discuss techniques to overcome this difficulty by&nbsp;employing private information transfer to determine whether there is a benefit from sharing data,&nbsp;and whether there is room to negotiate acceptable prices. These techniques involve cryptographic&nbsp;protocols. While currently considered secure, these protocols are potentially vulnerable to the&nbsp;development of quantum technology, particularly for ensuring privacy over significant periods&nbsp;of time into the future. To mitigate this concern, we describe how developments in practical&nbsp;quantum technology can improve the security of these protocols
    corecore