8,984 research outputs found

    Publishing Microdata with a Robust Privacy Guarantee

    Full text link
    Today, the publication of microdata poses a privacy threat. Vast research has striven to define the privacy condition that microdata should satisfy before it is released, and devise algorithms to anonymize the data so as to achieve this condition. Yet, no method proposed to date explicitly bounds the percentage of information an adversary gains after seeing the published data for each sensitive value therein. This paper introduces beta-likeness, an appropriately robust privacy model for microdata anonymization, along with two anonymization schemes designed therefor, the one based on generalization, and the other based on perturbation. Our model postulates that an adversary's confidence on the likelihood of a certain sensitive-attribute (SA) value should not increase, in relative difference terms, by more than a predefined threshold. Our techniques aim to satisfy a given beta threshold with little information loss. We experimentally demonstrate that (i) our model provides an effective privacy guarantee in a way that predecessor models cannot, (ii) our generalization scheme is more effective and efficient in its task than methods adapting algorithms for the k-anonymity model, and (iii) our perturbation method outperforms a baseline approach. Moreover, we discuss in detail the resistance of our model and methods to attacks proposed in previous research.Comment: VLDB201

    Data Anonymization for Privacy Preservation in Big Data

    Get PDF
    Cloud computing provides capable ascendable IT edifice to provision numerous processing of a various big data applications in sectors such as healthcare and business. Mainly electronic health records data sets and in such applications generally contain privacy-sensitive data. The most popular technique for data privacy preservation is anonymizing the data through generalization. Proposal is to examine the issue against proximity privacy breaches for big data anonymization and try to recognize a scalable solution to this issue. Scalable clustering approach with two phase consisting of clustering algorithm and K-Anonymity scheme with Generalisation and suppression is intended to work on this problem. Design of the algorithms is done with MapReduce to increase high scalability by carrying out dataparallel execution in cloud. Wide-ranging researches on actual data sets substantiate that the method deliberately advances the competence of defensive proximity privacy breaks, the scalability and the efficiency of anonymization over existing methods. Anonymizing data sets through generalization to gratify some of the privacy attributes like k- Anonymity is a popularly-used type of privacy preserving methods. Currently, the gauge of data in numerous cloud surges extremely in agreement with the Big Data, making it a dare for frequently used tools to actually get, manage, and process large-scale data for a particular accepted time scale. Hence, it is a trial for prevailing anonymization approaches to attain privacy conservation for big data private information due to scalabilty issues

    Ontology-Based Quality Evaluation of Value Generalization Hierarchies for Data Anonymization

    Full text link
    In privacy-preserving data publishing, approaches using Value Generalization Hierarchies (VGHs) form an important class of anonymization algorithms. VGHs play a key role in the utility of published datasets as they dictate how the anonymization of the data occurs. For categorical attributes, it is imperative to preserve the semantics of the original data in order to achieve a higher utility. Despite this, semantics have not being formally considered in the specification of VGHs. Moreover, there are no methods that allow the users to assess the quality of their VGH. In this paper, we propose a measurement scheme, based on ontologies, to quantitatively evaluate the quality of VGHs, in terms of semantic consistency and taxonomic organization, with the aim of producing higher-quality anonymizations. We demonstrate, through a case study, how our evaluation scheme can be used to compare the quality of multiple VGHs and can help to identify faulty VGHs.Comment: 18 pages, 7 figures, presented in the Privacy in Statistical Databases Conference 2014 (Ibiza, Spain

    Non-Metric Multi-Dimensional Scaling for Distance-Based Privacy-Preserving Data Mining

    Get PDF
    Recent advances in the field of data mining have led to major concerns about privacy. Sharing data with external parties for analysis puts private information at risk. The original data are often perturbed before external release to protect private information. However, data perturbation can decrease the utility of the output. A good perturbation technique requires balance between privacy and utility. This study proposes a new method for data perturbation in the context of distance-based data mining. We propose the use of non-metric multi-dimensional scaling (MDS) as a suitable technique to perturb data that are intended for distance-based data mining. The basic premise of this approach is to transform the original data into a lower dimensional space and generate new data that protect private details while maintaining good utility for distance-based data mining analysis. We investigate the extent the perturbed data are able to preserve useful statistics for distance-based analysis and to provide protection against malicious attacks. We demonstrate that our method provides an adequate alternative to data randomisation approaches and other dimensionality reduction approaches. Testing is conducted on a wide range of benchmarked datasets and against some existing perturbation methods. The results confirm that our method has very good overall performance, is competitive with other techniques, and produces clustering and classification results at least as good, and in some cases better, than the results obtained from the original data

    A flexible architecture for privacy-aware trust management

    Get PDF
    In service-oriented systems a constellation of services cooperate, sharing potentially sensitive information and responsibilities. Cooperation is only possible if the different participants trust each other. As trust may depend on many different factors, in a flexible framework for Trust Management (TM) trust must be computed by combining different types of information. In this paper we describe the TAS3 TM framework which integrates independent TM systems into a single trust decision point. The TM framework supports intricate combinations whilst still remaining easily extensible. It also provides a unified trust evaluation interface to the (authorization framework of the) services. We demonstrate the flexibility of the approach by integrating three distinct TM paradigms: reputation-based TM, credential-based TM, and Key Performance Indicator TM. Finally, we discuss privacy concerns in TM systems and the directions to be taken for the definition of a privacy-friendly TM architecture.\u

    A Novel Privacy Disclosure Risk Measure and Optimizing Privacy Preserving Data Publishing Techniques

    Get PDF
    A tremendous amount of individual-level data is generated each day, with a wide variety of uses. This data often contains sensitive information about individuals, which can be disclosed by “adversaries”. Even when direct identifiers such as social security numbers are masked, an adversary may be able to recognize an individual\u27s identity for a data record by looking at the values of quasi-identifiers (QID), known as identity disclosure, or can uncover sensitive attributes (SA) about an individual through attribute disclosure. In data privacy field, multiple disclosure risk measures have been proposed. These share two drawbacks: they do not consider identity and attribute disclosure concurrently, and they make restrictive assumptions on an adversary\u27s knowledge and disclosure target by assuming certain attributes are QIDs and SAs with clear boundary in between. In this study, we present a Flexible Adversary Disclosure Risk (FADR) measure that addresses these limitations, by presenting a single combined metric of identity and attribute disclosure, and considering all scenarios for an adversary’s knowledge and disclosure targets while providing the flexibility to model a specific disclosure preference. In addition, we employ FADR measure to develop our novel “RU Generalization” algorithm that anonymizes a sensitive dataset to be able to publish the data for public access while preserving the privacy of individuals in the dataset. The challenge is to preserve privacy without incurring excessive information loss. Our RU Generalization algorithm is a greedy heuristic algorithm, which aims at minimizing the combination of both disclosure risk and information loss, to obtain an optimized anonymized dataset. We have conducted a set of experiments on a benchmark dataset from 1994 Census database, to evaluate both our FADR measure and RU Generalization algorithm. We have shown the robustness of our FADR measure and the effectiveness of our RU Generalization algorithm by comparing with the benchmark anonymization algorithm
    corecore