201 research outputs found

    Counteracting Bloom Filter Encoding Techniques for Private Record Linkage

    Get PDF
    Record Linkage is a process of combining records representing same entity spread across multiple and different data sources, primarily for data analytics. Traditionally, this could be performed with comparing personal identifiers present in data (e.g., given name, surname, social security number etc.). However, sharing information across databases maintained by disparate organizations leads to exchange of personal information pertaining to an individual. In practice, various statutory regulations and policies prohibit the disclosure of such identifiers. Private record linkage (PRL) techniques have been implemented to execute record linkage without disclosing any information about other dissimilar records. Various techniques have been proposed to implement PRL, including cryptographically secure multi-party computational protocols. However, these protocols have been debated over the scalability factors as they are computationally extensive by nature. Bloom filter encoding (BFE) for private record linkage has become a topic of recent interest in the medical informatics community due to their versatility and ability to match records approximately in a manner that is (ostensibly) privacy-preserving. It also has the advantage of computing matches directly in plaintext space making them much faster than their secure mutli-party computation counterparts. The trouble with BFEs lies in their security guarantees: by their very nature BFEs leak information to assist in the matching process. Despite this known shortcoming, BFEs continue to be studied in the context of new heuristically designed countermeasures to address known attacks. A new class of set-intersection attack is proposed in this thesis which re-examines the security of BFEs by conducting experiments, demonstrating an inverse relationship between security and accuracy. With real-world deployment of BFEs in the health information sector approaching, the results from this work will generate renewed discussion around the security of BFEs as well as motivate research into new, more efficient multi-party protocols for private approximate matching

    An Efficient Two-Party Protocol for Approximate Matching in Private Record Linkage

    Get PDF
    The task of linking multiple databases with the aim to identify records that refer to the same entity is occurring increasingly in many application areas. If unique identifiers for the entities are not available in all the databases to be linked, techniques that calculate approximate similarities between records must be used for the identification of matching pairs of records. Often, the records to be linked contain personal information such as names and addresses. In many applications, the exchange of attribute values that contain such personal details between organisations is not allowed due to privacy concerns. The linking of records between databases without revealing the actual attribute values in these records is the research problem known as 'privacy-preserving record linkage' (PPRL).While various approaches have been proposed to deal with privacy within the record linkage process, a viable solution that is well applicable to real-world conditions needs to address the major aspect of scalability of linking very large databases while preserving security and linkage quality. We propose a novel two-party protocol for PPRL that addresses scalability, security and quality/ accuracy. The protocol is based on (1) the use of reference values that are available to both database owners, and allows them to individually calculate the similarities between their attribute values and the reference values; and (2) the binning of these calculated similarity values to allow their secure exchange between the two database owners. Experiments on a real-world database with nearly two million records yield linkage results that have a linear scalability to large databases and high linkage accuracy, allowing for approximate matching in the privacy-preserving context. Since the protocol has a low computational burden and allows quality approximate matching while still preserving the privacy of the databases that are matched, the protocol can be useful for many real-world applications requiring PPRL

    Privacy-preserving record linkage using Bloom filters

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Combining multiple databases with disjunctive or additional information on the same person is occurring increasingly throughout research. If unique identification numbers for these individuals are not available, probabilistic record linkage is used for the identification of matching record pairs. In many applications, identifiers have to be encrypted due to privacy concerns.</p> <p>Methods</p> <p>A new protocol for privacy-preserving record linkage with encrypted identifiers allowing for errors in identifiers has been developed. The protocol is based on Bloom filters on <it>q</it>-grams of identifiers.</p> <p>Results</p> <p>Tests on simulated and actual databases yield linkage results comparable to non-encrypted identifiers and superior to results from phonetic encodings.</p> <p>Conclusion</p> <p>We proposed a protocol for privacy-preserving record linkage with encrypted identifiers allowing for errors in identifiers. Since the protocol can be easily enhanced and has a low computational burden, the protocol might be useful for many applications requiring privacy-preserving record linkage.</p

    A Taxonomy of Privacy-Preserving Record Linkage Techniques

    Get PDF
    The process of identifying which records in two or more databases correspond to the same entity is an important aspect of data quality activities such as data pre-processing and data integration. Known as record linkage, data matching or entity resolution, this process has attracted interest from researchers in fields such as databases and data warehousing, data mining, information systems, and machine learning. Record linkage has various challenges, including scalability to large databases, accurate matching and classification, and privacy and confidentiality. The latter challenge arises because commonly personal identifying data, such as names, addresses and dates of birth of individuals, are used in the linkage process. When databases are linked across organizations, the issue of how to protect the privacy and confidentiality of such sensitive information is crucial to successful application of record linkage. In this paper we present an overview of techniques that allow the linking of databases between organizations while at the same time preserving the privacy of these data. Known as 'privacy-preserving record linkage' (PPRL), various such techniques have been developed. We present a taxonomy of PPRL techniques to characterize these techniques along 15 dimensions, and conduct a survey of PPRL techniques. We then highlight shortcomings of current techniques and discuss avenues for future research

    Privacy preserving interactive record linkage (PPIRL)

    Get PDF
    Record linkage to integrate uncoordinated databases is critical in biomedical research using Big Data. Balancing privacy protection against the need for high quality record linkage requires a human–machine hybrid system to safely manage uncertainty in the ever changing streams of chaotic Big Data

    Cancer

    Get PDF
    BackgroundUnderstanding of cancer outcomes is limited by data fragmentation. We analyzed the information yielded by integrating breast cancer data from three sources: electronic medical records (EMRs) of two healthcare systems and the state registry.MethodsWe extracted diagnostic test and treatment data from EMRs of all breast cancer patients treated from 2000\ue2\u20ac\u201c2010 in two independent California institutions: a community-based practice (Palo Alto Medical Foundation) and an academic medical center (Stanford University). We incorporated records from the population-based California Cancer Registry (CCR), and then linked EMR-CCR datasets of Community and University patients.ResultsWe initially identified 8210 University patients and 5770 Community patients; linked datasets revealed a 16% patient overlap, yielding 12,109 unique patients. The proportion of all Community patients, but not University patients, treated at both institutions increased with worsening cancer prognostic factors. Before linking datasets, Community patients appeared to receive less intervention than University patients (mastectomy: 37.6% versus 43.2%; chemotherapy: 35% versus 41.7%; magnetic resonance imaging (MRI): 10% versus 29.3%; genetic testing: 2.5% versus 9.2%). Linked Community and University datasets revealed that patients treated at both institutions received substantially more intervention (mastectomy: 55.8%; chemotherapy: 47.2%; MRI: 38.9%; genetic testing: 10.9%; p<0.001 for each three-way institutional comparison).ConclusionData linkage identified 16% of patients who were treated in two healthcare systems and who, despite comparable prognostic factors, received far more intensive treatment than others. By integrating complementary data from EMRs and population-based registries, we obtained a more comprehensive understanding of breast cancer care and factors that drive treatment utilization.1U58 DP000807-01/DP/NCCDPHP CDC HHS/United StatesHHSN261201000034C/CA/NCI NIH HHS/United StatesHHSN261201000034C/PHS HHS/United StatesHHSN261201000035C/PHS HHS/United StatesHHSN261201000140C/CA/NCI NIH HHS/United StatesHHSN261201000140C/PHS HHS/United States2015-01-01T00:00:00Z24101577PMC386759
    • …
    corecore