17,615 research outputs found

    Reversible Data Perturbation Techniques for Multi-level Privacy-preserving Data Publication

    Get PDF
    The amount of digital data generated in the Big Data age is increasingly rapidly. Privacy-preserving data publishing techniques based on differential privacy through data perturbation provide a safe release of datasets such that sensitive information present in the dataset cannot be inferred from the published data. Existing privacy-preserving data publishing solutions have focused on publishing a single snapshot of the data with the assumption that all users of the data share the same level of privilege and access the data with a fixed privacy level. Thus, such schemes do not directly support data release in cases when data users have different levels of access on the published data. While a straight-forward approach of releasing a separate snapshot of the data for each possible data access level can allow multi-level access, it can result in a higher storage cost requiring separate storage space for each instance of the published data. In this paper, we develop a set of reversible data perturbation techniques for large bipartite association graphs that use perturbation keys to control the sequential generation of multiple snapshots of the data to offer multi-level access based on privacy levels. The proposed schemes enable multi-level data privacy, allowing selective de-perturbation of the published data when suitable access credentials are provided. We evaluate the techniques through extensive experiments on a large real-world association graph dataset and our experiments show that the proposed techniques are efficient, scalable and effectively support multi-level data privacy on the published data

    Privacy-Preserving Trajectory Data Publishing via Differential Privacy

    Get PDF
    Over the past decade, the collection of data by individuals, businesses and government agencies has increased tremendously. Due to the widespread of mobile computing and the advances in location-acquisition techniques, an immense amount of data concerning the mobility of moving objects have been generated. The movement data of an object (e.g. individual) might include specific information about the locations it visited, the time those locations were visited, or both. While it is beneficial to share data for the purpose of mining and analysis, data sharing might risk the privacy of the individuals involved in the data. Privacy-Preserving Data Publishing (PPDP) provides techniques that utilize several privacy models for the purpose of publishing useful information while preserving data privacy. The objective of this thesis is to answer the following question: How can a data owner publish trajectory data while simultaneously safeguarding the privacy of the data and maintaining its usefulness? We propose an algorithm for anonymizing and publishing trajectory data that ensures the output is differentially private while maintaining high utility and scalability. Our solution comprises a twofold approach. First, we generalize trajectories by generalizing and then partitioning the timestamps at each location in a differentially private manner. Next, we add noise to the real count of the generalized trajectories according to the given privacy budget to enforce differential privacy. As a result, our approach achieves an overall epsilon-differential privacy on the output trajectory data. We perform experimental evaluation on real-life data, and demonstrate that our proposed approach can effectively answer count and range queries, as well as mining frequent sequential patterns. We also show that our algorithm is efficient w.r.t. privacy budget and number of partitions, and also scalable with increasing data size

    Privacy and Classification Of Analyzed Data Using EMD

    Get PDF
    In recent years many researchers issued on data publishing with recommended settings .But privacy is a key issue here. Existing techniques such as K-anonymity and L-diversity should not provide effective and sufficient results for privacy preserving in data publishing. So in this paper we propose tree base algorithm for providing security, In this technique we arrange the data in tree based format for closeness of a data publishing and for retrieving data in sequential order. Our techniques also improved more security to micro data publishing and retrieving relevant information from micro data using attribute disclosure

    Toward Privacy in High-Dimensional Data Publishing

    Get PDF
    Nowadays data sharing among multiple parties has become inevitable in various application domains for diverse reasons, such as decision support, policy development and data mining. Yet, data in its raw format often contains person-specific sensitive information, and publishing such data without proper protection may jeopardize individual privacy. This fact has spawned extensive research on privacy-preserving data publishing (PPDP), which balances the fundamental trade-off between individual privacy and the utility of published data. Early research of PPDP focuses on protecting private and sensitive information in relational and statistical data. However, the recent prevalence of several emerging types of high-dimensional data has rendered unique challenges that prevent traditional PPDP techniques from being directly used. In this thesis, we address the privacy concerns in publishing four types of high-dimensional data, namely set-valued data, trajectory data, sequential data and network data. We develop effective and efficient non-interactive data publishing solutions for various utility requirements. Most of our solutions satisfy a rigorous privacy guarantee known as differential privacy, which has been the de facto standard for privacy protection. This thesis demonstrates that our solutions have exhibited great promise for releasing useful high-dimensional data without endangering individual privacy

    Preserving Privacy and Utility in RFID Data Publishing

    Get PDF
    Radio Frequency IDentification (RFID) is a technology that helps machines identify objects remotely. The RFID technology has been extensively used in many domains, such as mass transportation and healthcare management systems. The collected RFID data capture the detailed movement information of the tagged objects, offering tremendous opportunities for mining useful knowledge. Yet, publishing the raw RFID data for data mining would reveal the specific locations, time, and some other potentially sensitive information of the tagged objects or individuals. In this paper, we study the privacy threats in RFID data publishing and show that traditional anonymization methods are not applicable for RFID data due to its challenging properties: high-dimensional, sparse, and sequential. Our primary contributions are (1) to adopt a new privacy model called LKC-privacy that overcomes these challenges, and (2) to develop an efficient anonymization algorithm to achieve LKC-privacy while preserving the information utility for data mining

    SafePath: Differentially-private publishing of passenger trajectories in transportation systems

    Get PDF
    © 2018 Elsevier B.V. In recent years, the collection of spatio-temporal data that captures human movements has increased tremendously due to the advancements in hardware and software systems capable of collecting person-specific data. The bulk of the data collected by these systems has numerous applications, or it can simply be used for general data analysis. Therefore, publishing such big data is greatly beneficial for data recipients. However, in its raw form, the collected data contains sensitive information pertaining to the individuals from which it was collected and must be anonymized before publication. In this paper, we study the problem of privacy-preserving passenger trajectories publishing and propose a solution under the rigorous differential privacy model. Unlike sequential data, which describes sequentiality between data items, handling spatio-temporal data is a challenging task due to the fact that introducing a temporal dimension results in extreme sparseness. Our proposed solution introduces an efficient algorithm, called SafePath, that models trajectories as a noisy prefix tree and publishes ϵ-differentially-private trajectories while minimizing the impact on data utility. Experimental evaluation on real-life transit data in Montreal suggests that SafePath significantly improves efficiency and scalability with respect to large and sparse datasets, while achieving comparable results to existing solutions in terms of the utility of the sanitized data
    corecore