Search CORE

4,423 research outputs found

Privacy and Confidentiality in an e-Commerce World: Data Mining, Data Warehousing, Matching and Disclosure Limitation

Author: Fienberg Stephen E.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2006
Field of study

The growing expanse of e-commerce and the widespread availability of online databases raise many fears regarding loss of privacy and many statistical challenges. Even with encryption and other nominal forms of protection for individual databases, we still need to protect against the violation of privacy through linkages across multiple databases. These issues parallel those that have arisen and received some attention in the context of homeland security. Following the events of September 11, 2001, there has been heightened attention in the United States and elsewhere to the use of multiple government and private databases for the identification of possible perpetrators of future attacks, as well as an unprecedented expansion of federal government data mining activities, many involving databases containing personal information. We present an overview of some proposals that have surfaced for the search of multiple databases which supposedly do not compromise possible pledges of confidentiality to the individuals whose data are included. We also explore their link to the related literature on privacy-preserving data mining. In particular, we focus on the matching problem across databases and the concept of ``selective revelation'' and their confidentiality implications.Comment: Published at http://dx.doi.org/10.1214/088342306000000240 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

A Taxonomy of Privacy-Preserving Record Linkage Techniques

Author: Christen Peter
Vatsalan Dinusha
Verykios Vassilios
Publication venue: 'Elsevier BV'
Publication date: 07/12/2015
Field of study

The process of identifying which records in two or more databases correspond to the same entity is an important aspect of data quality activities such as data pre-processing and data integration. Known as record linkage, data matching or entity resolution, this process has attracted interest from researchers in fields such as databases and data warehousing, data mining, information systems, and machine learning. Record linkage has various challenges, including scalability to large databases, accurate matching and classification, and privacy and confidentiality. The latter challenge arises because commonly personal identifying data, such as names, addresses and dates of birth of individuals, are used in the linkage process. When databases are linked across organizations, the issue of how to protect the privacy and confidentiality of such sensitive information is crucial to successful application of record linkage. In this paper we present an overview of techniques that allow the linking of databases between organizations while at the same time preserving the privacy of these data. Known as 'privacy-preserving record linkage' (PPRL), various such techniques have been developed. We present a taxonomy of PPRL techniques to characterize these techniques along 15 dimensions, and conduct a survey of PPRL techniques. We then highlight shortcomings of current techniques and discuss avenues for future research

The Australian National University

Recommended from our members

Patient privacy protection using anonymous access control techniques

Author: Elmufti K.
Rajarajan M.
Rakocevic V.
Weerasinghe D.
Publication venue: 'Georg Thieme Verlag KG'
Publication date: 01/01/2008
Field of study

Objective: The objective of this study is to develop a solution to preserve security and privacy in a healthcare environment where health-sensitive information will be accessed by many parties and stored in various distributed databases. The solution should maintain anonymous medical records and it should be able to link anonymous medical information in distributed databases into a single patient medical record with the patient identity. Methods: In this paper we present a protocol that can be used to authenticate and authorize patients to healthcare services without providing the patient identification. Healthcare service can identify the patient using separate temporary identities in each identification session and medical records are linked to these temporary identities. Temporary identities can be used to enable record linkage and reverse track real patient identity in critical medical situations. Results: The proposed protocol provides main security and privacy services such as user anonymity, message privacy, message confidentiality, user authentication, user authorization and message replay attacks. The medical environment validates the patient at the healthcare service as a real and registered patient for the medical services. Using the proposed protocol, the patient anonymous medical records at different healthcare services can be linked into one single report and it is possible to securely reverse track anonymous patient into the real identity. Conclusion: The protocol protects the patient privacy with a secure anonymous authentication to healthcare services and medical record registries according to the European and the UK legislations, where the patient real identity is not disclosed with the distributed patient medical records

City Research Online

Privacy-preserving targeted advertising scheme for IPTV using the cloud

Author: Javid Khayati Leyli
Orencik Cengiz
Savas Erkay
Savaş Erkay
Ustaoglu Berkant
Ustaoğlu Berkant
Örencik Cengiz
Publication venue: 'Scitepress'
Publication date: 01/01/2012
Field of study

In this paper, we present a privacy-preserving scheme for targeted advertising via the Internet Protocol TV (IPTV). The scheme uses a communication model involving a collection of viewers/subscribers, a content provider (IPTV), an advertiser, and a cloud server. To provide high quality directed advertising service, the advertiser can utilize not only demographic information of subscribers, but also their watching habits. The latter includes watching history, preferences for IPTV content and watching rate, which are published on the cloud server periodically (e.g. weekly) along with anonymized demographics. Since the published data may leak sensitive information about subscribers, it is safeguarded using cryptographic techniques in addition to the anonymization of demographics. The techniques used by the advertiser, which can be manifested in its queries to the cloud, are considered (trade) secrets and therefore are protected as well. The cloud is oblivious to the published data, the queries of the advertiser as well as its own responses to these queries. Only a legitimate advertiser, endorsed with a so-called {\em trapdoor} by the IPTV, can query the cloud and utilize the query results. The performance of the proposed scheme is evaluated with experiments, which show that the scheme is suitable for practical usage

Sabanci University Research Database

DSpace@IZTECH

Privacy Preserving Utility Mining: A Survey

Author: Chao Han-Chieh
Gan Wensheng
Lin Jerry Chun-Wei
Wang Shyue-Liang
Yu Philip S.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 18/11/2018
Field of study

In big data era, the collected data usually contains rich information and hidden knowledge. Utility-oriented pattern mining and analytics have shown a powerful ability to explore these ubiquitous data, which may be collected from various fields and applications, such as market basket analysis, retail, click-stream analysis, medical analysis, and bioinformatics. However, analysis of these data with sensitive private information raises privacy concerns. To achieve better trade-off between utility maximizing and privacy preserving, Privacy-Preserving Utility Mining (PPUM) has become a critical issue in recent years. In this paper, we provide a comprehensive overview of PPUM. We first present the background of utility mining, privacy-preserving data mining and PPUM, then introduce the related preliminaries and problem formulation of PPUM, as well as some key evaluation criteria for PPUM. In particular, we present and discuss the current state-of-the-art PPUM algorithms, as well as their advantages and deficiencies in detail. Finally, we highlight and discuss some technical challenges and open directions for future research on PPUM.Comment: 2018 IEEE International Conference on Big Data, 10 page

arXiv.org e-Print Archive

Crossref