1,680 research outputs found

    Privacy-Preserving Record Linkage for Big Data: Current Approaches and Research Challenges

    Get PDF
    The growth of Big Data, especially personal data dispersed in multiple data sources, presents enormous opportunities and insights for businesses to explore and leverage the value of linked and integrated data. However, privacy concerns impede sharing or exchanging data for linkage across different organizations. Privacy-preserving record linkage (PPRL) aims to address this problem by identifying and linking records that correspond to the same real-world entity across several data sources held by different parties without revealing any sensitive information about these entities. PPRL is increasingly being required in many real-world application areas. Examples range from public health surveillance to crime and fraud detection, and national security. PPRL for Big Data poses several challenges, with the three major ones being (1) scalability to multiple large databases, due to their massive volume and the flow of data within Big Data applications, (2) achieving high quality results of the linkage in the presence of variety and veracity of Big Data, and (3) preserving privacy and confidentiality of the entities represented in Big Data collections. In this chapter, we describe the challenges of PPRL in the context of Big Data, survey existing techniques for PPRL, and provide directions for future research.This work was partially funded by the Australian Research Council under Discovery Project DP130101801, the German Academic Exchange Service (DAAD) and Universities Australia (UA) under the Joint Research Co-operation Scheme, and also funded by the German Federal Ministry of Education and Research within the project Competence Center for Scalable Data Services and Solutions (ScaDS) Dresden/Leipzig (BMBF 01IS14014B)

    Privacy-Preserving Trajectory Data Publishing via Differential Privacy

    Get PDF
    Over the past decade, the collection of data by individuals, businesses and government agencies has increased tremendously. Due to the widespread of mobile computing and the advances in location-acquisition techniques, an immense amount of data concerning the mobility of moving objects have been generated. The movement data of an object (e.g. individual) might include specific information about the locations it visited, the time those locations were visited, or both. While it is beneficial to share data for the purpose of mining and analysis, data sharing might risk the privacy of the individuals involved in the data. Privacy-Preserving Data Publishing (PPDP) provides techniques that utilize several privacy models for the purpose of publishing useful information while preserving data privacy. The objective of this thesis is to answer the following question: How can a data owner publish trajectory data while simultaneously safeguarding the privacy of the data and maintaining its usefulness? We propose an algorithm for anonymizing and publishing trajectory data that ensures the output is differentially private while maintaining high utility and scalability. Our solution comprises a twofold approach. First, we generalize trajectories by generalizing and then partitioning the timestamps at each location in a differentially private manner. Next, we add noise to the real count of the generalized trajectories according to the given privacy budget to enforce differential privacy. As a result, our approach achieves an overall epsilon-differential privacy on the output trajectory data. We perform experimental evaluation on real-life data, and demonstrate that our proposed approach can effectively answer count and range queries, as well as mining frequent sequential patterns. We also show that our algorithm is efficient w.r.t. privacy budget and number of partitions, and also scalable with increasing data size

    Privacy, Space and Time: a Survey on Privacy-Preserving Continuous Data Publishing

    Get PDF
    Sensors, portable devices, and location-based services, generate massive amounts of geo-tagged, and/or location- and user-related data on a daily basis. The manipulation of such data is useful in numerous application domains, e.g., healthcare, intelligent buildings, and traffic monitoring, to name a few. A high percentage of these data carry information of users\u27 activities and other personal details, and thus their manipulation and sharing arise concerns about the privacy of the individuals involved. To enable the secure—from the users\u27 privacy perspective—data sharing, researchers have already proposed various seminal techniques for the protection of users\u27 privacy. However, the continuous fashion in which data are generated nowadays, and the high availability of external sources of information, pose more threats and add extra challenges to the problem. In this survey, we visit the works done on data privacy for continuous data publishing, and report on the proposed solutions, with a special focus on solutions concerning location or geo-referenced data

    効率的で安全な集合間類似結合に関する研究

    Get PDF
    筑波大学 (University of Tsukuba)201

    Data privacy

    Get PDF
    Data privacy studies methods, tools, and theory to avoid the disclosure of sensitive information. Its origin is in statistics with the goal to ensure the confidentiality of data gathered from census and questionnaires. The topic was latter introduced in computer science and more particularly in data mining, where due to the large amount of data currently available, has attracted the interest of researchers, practitioners, and companies. In this paper we will review the main topics related to data privacy and privacy-enhancing technologies
    corecore