97 research outputs found

    Protecting Privacy When Sharing and Releasing Data with Multiple Records per Person

    Get PDF
    This study concerns the risks of privacy disclosure when sharing and releasing a dataset in which each individual may be associated with multiple records. Existing data privacy approaches and policies typically assume that each individual in a shared dataset corresponds to a single record, leading to an underestimation of the disclosure risks in multiple records per person scenarios. We propose two novel measures of privacy disclosure to arrive at a more appropriate assessment of disclosure risks. The first measure assesses individual-record disclosure risk based upon the frequency distribution of individuals’ occurrences. The second measure assesses sensitive-attribute disclosure risk based upon the number of individuals affiliated with a sensitive value. We show that the two proposed disclosure measures generalize the well-known k-anonymity and l-diversity measures, respectively, and work for scenarios with either a single record or multiple records per person. We have developed an efficient computational procedure that integrates the two proposed measures and a data quality measure to anonymize the data with multiple records per person when sharing and releasing the data for research and analytics. The results of the experimental evaluation using real-world data demonstrate the advantage of the proposed approach over existing techniques for protecting privacy while preserving data quality

    Anonymization procedures for tabular data: an explanatory technical and legal synthesis

    Get PDF
    In the European Union, Data Controllers and Data Processors, who work with personal data, have to comply with the General Data Protection Regulation and other applicable laws. This affects the storing and processing of personal data. But some data processing in data mining or statistical analyses does not require any personal reference to the data. Thus, personal context can be removed. For these use cases, to comply with applicable laws, any existing personal information has to be removed by applying the so-called anonymization. However, anonymization should maintain data utility. Therefore, the concept of anonymization is a double-edged sword with an intrinsic trade-off: privacy enforcement vs. utility preservation. The former might not be entirely guaranteed when anonymized data are published as Open Data. In theory and practice, there exist diverse approaches to conduct and score anonymization. This explanatory synthesis discusses the technical perspectives on the anonymization of tabular data with a special emphasis on the European Union’s legal base. The studied methods for conducting anonymization, and scoring the anonymization procedure and the resulting anonymity are explained in unifying terminology. The examined methods and scores cover both categorical and numerical data. The examined scores involve data utility, information preservation, and privacy models. In practice-relevant examples, methods and scores are experimentally tested on records from the UCI Machine Learning Repository’s “Census Income (Adult)” dataset

    A Survey and Experimental Study on Privacy-Preserving Trajectory Data Publishing

    Get PDF
    Trajectory data has become ubiquitous nowadays, which can benefit various real-world applications such as traffic management and location-based services. However, trajectories may disclose highly sensitive information of an individual including mobility patterns, personal profiles and gazetteers, social relationships, etc, making it indispensable to consider privacy protection when releasing trajectory data. Ensuring privacy on trajectories demands more than hiding single locations, since trajectories are intrinsically sparse and high-dimensional, and require to protect multi-scale correlations. To this end, extensive research has been conducted to design effective techniques for privacy-preserving trajectory data publishing. Furthermore, protecting privacy requires carefully balance two metrics: privacy and utility. In other words, it needs to protect as much privacy as possible and meanwhile guarantee the usefulness of the released trajectories for data analysis. In this survey, we provide a comprehensive study and a systematic summarization of existing protection models, privacy and utility metrics for trajectories developed in the literature. We also conduct extensive experiments on two real-life public trajectory datasets to evaluate the performance of several representative privacy protection models, demonstrate the trade-off between privacy and utility, and guide the choice of the right privacy model for trajectory publishing given certain privacy and utility desiderata

    An Approach for Ensuring Robust Support for Location Privacy and Identity Inference Protection

    Get PDF
    The challenge of preserving a user\u27s location privacy is more important now than ever before with the proliferation of handheld devices and the pervasive use of location based services. To protect location privacy, we must ensure k-anonymity so that the user remains indistinguishable among k-1 other users. There is no better way but to use a location anonymizer (LA) to achieve k-anonymity. However, its knowledge of each user\u27s current location makes it susceptible to be a single-point-of-failure. In this thesis, we propose a formal location privacy framework, termed SafeGrid that can work with or without an LA. In SafeGrid, LA is designed in such a way that it is no longer a single point of failure. In addition, it is resistant to known attacks and most significantly, the cloaking algorithm it employs meets reciprocity condition. Simulation results exhibit its better performance in query processing and cloaking region calculation compared with existing solutions. In this thesis, we also show that satisfying k-anonymity is not enough in preserving privacy. Especially in an environment where a group of colluded service providers collaborate with each other, a user\u27s privacy can be compromised through identity inference attacks. We present a detailed analysis of such attacks on privacy and propose a novel and powerful privacy definition called s-proximity. In addition to building a formal definition for s-proximity, we show that it is practical and it can be incorporated efficiently into existing systems to make them secure

    Identity, location and query privacy for smart devices

    Full text link
    In this thesis, we have discussed three important aspects of users\u27 privacy namely, location privacy, identity privacy and query privacy. The information related to identity, location and query is very sensitive as it can reveal behavior patterns, interests, preferences and habits of the users. We have proposed several techniques in the thesis on how to better protect the identity, location and query privacy

    Privacy-preserving mechanism for social network data publishing

    Full text link
     Privacy is receiving growing concern from various parties especially consumers due to the simplification of the collection and distribution of personal data. This research focuses on preserving privacy in social network data publishing. The study explores the data anonymization mechanism in order to improve privacy protection of social network users. We identified new type of privacy breach and has proposed an effective mechanism for privacy protection

    Privacy in the Genomic Era

    Get PDF
    Genome sequencing technology has advanced at a rapid pace and it is now possible to generate highly-detailed genotypes inexpensively. The collection and analysis of such data has the potential to support various applications, including personalized medical services. While the benefits of the genomics revolution are trumpeted by the biomedical community, the increased availability of such data has major implications for personal privacy; notably because the genome has certain essential features, which include (but are not limited to) (i) an association with traits and certain diseases, (ii) identification capability (e.g., forensics), and (iii) revelation of family relationships. Moreover, direct-to-consumer DNA testing increases the likelihood that genome data will be made available in less regulated environments, such as the Internet and for-profit companies. The problem of genome data privacy thus resides at the crossroads of computer science, medicine, and public policy. While the computer scientists have addressed data privacy for various data types, there has been less attention dedicated to genomic data. Thus, the goal of this paper is to provide a systematization of knowledge for the computer science community. In doing so, we address some of the (sometimes erroneous) beliefs of this field and we report on a survey we conducted about genome data privacy with biomedical specialists. Then, after characterizing the genome privacy problem, we review the state-of-the-art regarding privacy attacks on genomic data and strategies for mitigating such attacks, as well as contextualizing these attacks from the perspective of medicine and public policy. This paper concludes with an enumeration of the challenges for genome data privacy and presents a framework to systematize the analysis of threats and the design of countermeasures as the field moves forward
    • …
    corecore