1,294 research outputs found

    Detection and Privacy Preservation of Sensitive Attributes Using Hybrid Approach for Privacy Preserving Record Linkage

    Get PDF
    Privacy Preserving Record Linkage (PPRL) is a major area of database research which entangles in colluding huge multiple heterogeneous data sets with disjunctive or additional information about an entity while veiling its private information. This paper gives an enhanced algorithm for merging two datasets using Sorted Neighborhood Deterministic approach and an improved Preservation algorithm which makes use of automatic selection of sensitive attributes and pattern mining over dynamic queries. We guarantee strong privacy, less computational complexity and scalability and address the legitimate concerns over data security and privacy with our approach

    Record-Linkage from a Technical Point of View

    Get PDF
    TRecord linkage is used for preparing sampling frames, deduplication of lists and combining information on the same object from two different databases. If the identifiers of the same objects in two different databases have error free unique common identifiers like personal identification numbers (PID), record linkage is a simple file merge operation. If the identifiers contains errors, record linkage is a challenging task. In many applications, the files have widely different numbers of observations, for example a few thousand records of a sample survey and a few million records of an administrative database of social security numbers. Available software, privacy issues and future research topics are discussed.Record-Linkage, Data-mining, Privacy preserving protocols

    A Combined Approach For Private Indexing Mechanism

    Get PDF
    Private indexing is a set of approaches for analyzing research data that are similar or resemble similar ones. This is used in the database to keep track of the keys and their values. The main subject of this research is private indexing in record linkage to secure the data. Because unique personal identification numbers or social security numbers are not accessible in most countries or databases, data linkage is limited to attributes such as date of birth and names to distinguish between the number of records and the real-life entities they represent. For security reasons, the encryption of these identifiers is required. Privacy-preserving record linkage, frequently used to link private data within several databases from different companies, prevents sensitive information from being exposed to other companies. This research used a combined method to evaluate the data, using classic and new indexing methods. A combined approach is more secure than typical standard indexing in terms of privacy. Multibit tree indexing, which groups comparable data in many ways, creates a scalable tree-like structure that is both space and time flexible, as it avoids the need for redundant block structures. Because the record pair numbers to compare are the Cartesian product of both the file record numbers, the work required grows with the number of records to compare in the files. The evaluation findings of this research showed that combined method is scalable in terms of the number of databases to be linked, the database size, and the time required

    An Efficient Two-Party Protocol for Approximate Matching in Private Record Linkage

    Get PDF
    The task of linking multiple databases with the aim to identify records that refer to the same entity is occurring increasingly in many application areas. If unique identifiers for the entities are not available in all the databases to be linked, techniques that calculate approximate similarities between records must be used for the identification of matching pairs of records. Often, the records to be linked contain personal information such as names and addresses. In many applications, the exchange of attribute values that contain such personal details between organisations is not allowed due to privacy concerns. The linking of records between databases without revealing the actual attribute values in these records is the research problem known as 'privacy-preserving record linkage' (PPRL).While various approaches have been proposed to deal with privacy within the record linkage process, a viable solution that is well applicable to real-world conditions needs to address the major aspect of scalability of linking very large databases while preserving security and linkage quality. We propose a novel two-party protocol for PPRL that addresses scalability, security and quality/ accuracy. The protocol is based on (1) the use of reference values that are available to both database owners, and allows them to individually calculate the similarities between their attribute values and the reference values; and (2) the binning of these calculated similarity values to allow their secure exchange between the two database owners. Experiments on a real-world database with nearly two million records yield linkage results that have a linear scalability to large databases and high linkage accuracy, allowing for approximate matching in the privacy-preserving context. Since the protocol has a low computational burden and allows quality approximate matching while still preserving the privacy of the databases that are matched, the protocol can be useful for many real-world applications requiring PPRL

    Quality of record linkage in a highly automated cancer registry that relies on encrypted identity data

    Get PDF
    Objectives: In the absence of unique ID numbers, cancer and other registries in Germany and elsewhere rely on identity data to link records pertaining to the same patient. These data are often encrypted to ensure privacy. Some record linkage errors unavoidably occur. These errors were quantified for the cancer registry of North Rhine Westphalia which uses encrypted identity data. Methods: A sample of records was drawn from the registry, record linkage information was included. In parallel, plain text data for these records were retrieved to generate a gold standard. Record linkage error frequencies in the cancer registry were determined by comparison of the results of the routine linkage with the gold standard. Error rates were projected to larger registries. Results: In the sample studied, the homonym error rate was 0.015%; the synonym error rate was 0.2%. The F-measure was 0.9921. Projection to larger databases indicated that for a realistic development the homonym error rate will be around 1%, the synonym error rate around 2%. Conclusion: Observed error rates are low. This shows that effective methods to standardize and improve the quality of the input data have been implemented. This is crucial to keep error rates low when the registry’s database grows. The planned inclusion of unique health insurance numbers is likely to further improve record linkage quality. Cancer registration entirely based on electronic notification of records can process large amounts of data with high quality of record linkage

    A Taxonomy of Privacy-Preserving Record Linkage Techniques

    Get PDF
    The process of identifying which records in two or more databases correspond to the same entity is an important aspect of data quality activities such as data pre-processing and data integration. Known as record linkage, data matching or entity resolution, this process has attracted interest from researchers in fields such as databases and data warehousing, data mining, information systems, and machine learning. Record linkage has various challenges, including scalability to large databases, accurate matching and classification, and privacy and confidentiality. The latter challenge arises because commonly personal identifying data, such as names, addresses and dates of birth of individuals, are used in the linkage process. When databases are linked across organizations, the issue of how to protect the privacy and confidentiality of such sensitive information is crucial to successful application of record linkage. In this paper we present an overview of techniques that allow the linking of databases between organizations while at the same time preserving the privacy of these data. Known as 'privacy-preserving record linkage' (PPRL), various such techniques have been developed. We present a taxonomy of PPRL techniques to characterize these techniques along 15 dimensions, and conduct a survey of PPRL techniques. We then highlight shortcomings of current techniques and discuss avenues for future research
    • …
    corecore