80 research outputs found

    Generalization-Based k-Anonymization

    Get PDF
    Microaggregation is an anonymization technique consisting on partitioning the data into clusters no smaller than k elements and then replacing the whole cluster by its prototypical representant. Most of microaggregation techniques work on numerical attributes. However, many data sets are described by heterogeneous types of data, i.e., nu- merical and categorical attributes. In this paper we propose a new mi- croaggregation method for achieving a compliant k-anonymous masked file for categorical microdata based on generalization. The goal is to build a generalized description satisfied by at least k domain objects and to replace these domain objects by the description. The way to construct that generalization is similar that the one used in growing decision trees. Records that cannot be generalized satisfactorily are discarded, therefore some information is lost. In the experiments we performed we prove that the new approach gives good results. © Springer International Publishing Switzerland 2015.This research is partially funded by the Spanish MICINN projects COGNITIO (TIN-2012-38450-C03-03), EdeTRI (TIN2012-39348-C02-01) and COPRIVACY (TIN2011-27076-C03-03), the grant 2009-SGR-1434 from the Generalitat de Catalunya, and the European Project DwB (Grant Agreement Number 262608)Peer reviewe

    On Analysis of Mixed Data Classification with Privacy Preservation

    Get PDF
    Privacy-preserving data classification is a pervasive task in privacy-preserving data mining (PPDM). The main goal is to secure the identification of individuals from the released data to prevent privacy breach. However, the goal of classification involves accurate data classification. Thus, the problem is, how to accurately mine large amount of data for extracting relevant knowledge while protecting at the same time sensitive information existing in the database. One of the ways is to anonymize the data set that contains the sensitive information of individuals before getting it released for data analysis. In this paper, we have mainly analyzed the proposed method Microaggregation based Classification Tree (MiCT) which use the properties of decision tree for privacy-preserving classification of mixed data. The evaluations are done based on various privacy models developed keeping in mind the various situations which may arise during data analysis.Keywords:Microaggregation, decision tree, mixed data, data perturbation, classification accuracy, anonymous data

    Microaggregation Sorting Framework for K-Anonymity Statistical Disclosure Control in Cloud Computing

    Get PDF
    In cloud computing, there have led to an increase in the capability to store and record personal data ( microdata ) in the cloud. In most cases, data providers have no/little control that has led to concern that the personal data may be beached. Microaggregation techniques seek to protect microdata in such a way that data can be published and mined without providing any private information that can be linked to specific individuals. An optimal microaggregation method must minimize the information loss resulting from this replacement process. The challenge is how to minimize the information loss during the microaggregation process. This paper presents a sorting framework for Statistical Disclosure Control (SDC) to protect microdata in cloud computing. It consists of two stages. In the first stage, an algorithm sorts all records in a data set in a particular way to ensure that during microaggregation very dissimilar observations are never entered into the same cluster. In the second stage a microaggregation method is used to create k -anonymous clusters while minimizing the information loss. The performance of the proposed techniques is compared against the most recent microaggregation methods. Experimental results using benchmark datasets show that the proposed algorithms perform significantly better than existing associate techniques in the literature

    p-probabilistic k-anonymous microaggregation for the anonymization of surveys with uncertain participation

    Get PDF
    We develop a probabilistic variant of k-anonymous microaggregation which we term p-probabilistic resorting to a statistical model of respondent participation in order to aggregate quasi-identifiers in such a manner that k-anonymity is concordantly enforced with a parametric probabilistic guarantee. Succinctly owing the possibility that some respondents may not finally participate, sufficiently larger cells are created striving to satisfy k-anonymity with probability at least p. The microaggregation function is designed before the respondents submit their confidential data. More precisely, a specification of the function is sent to them which they may verify and apply to their quasi-identifying demographic variables prior to submitting the microaggregated data along with the confidential attributes to an authorized repository. We propose a number of metrics to assess the performance of our probabilistic approach in terms of anonymity and distortion which we proceed to investigate theoretically in depth and empirically with synthetic and standardized data. We stress that in addition to constituting a functional extension of traditional microaggregation, thereby broadening its applicability to the anonymization of statistical databases in a wide variety of contexts, the relaxation of trust assumptions is arguably expected to have a considerable impact on user acceptance and ultimately on data utility through mere availability.Peer ReviewedPostprint (author's final draft

    Implementation and evaluation of microaggregation algorithms for categorical data

    Get PDF
    Different data anonymization algorithms have been proposed in the literature, but sometimes it is not easy for the practitioners to understand which one is better for different situations.In a growingly digitalised world, the need for data privacy is apparent. Data scientists have contributed much previous work to ensure privacy regarding numerical data attributes in published datasets. However, work with categorical data tends to significantly affect the data utility concerning information loss, and less feasible research is available. The thesis aims to describe, implement and compare multiple microaggregation algorithms for categorical data. To achieve the goals of the thesis, and provide valuable output, multiple new proposals to handle categorical data based on the Mondrian algorithm were presented as part of the thesis. It was found that the proposals fared well compared to some previously presented algorithms, both in terms of algorithm execution time, potential information loss and reidentification risk

    Semantic microaggregation for the anonymization of query logs using the open directory project

    Get PDF
    Web search engines gather information from the queries performed by the user in the form of query logs. These logs are extremely useful for research, marketing, or profiling, but at the same time they are a great threat to the user’s privacy. We provide a novel approach to anonymize query logs so they ensure user k-anonymity, by extending a common method used in statistical disclosure control: microaggregation. Furthermore, our microaggregation approach takes into account the semantics of the queries by relying on the Open Directory Project. We have tested our proposal with real data from AOL query logsPeer Reviewe
    • …
    corecore