5 research outputs found

    On the Complexity of tt-Closeness Anonymization and Related Problems

    Full text link
    An important issue in releasing individual data is to protect the sensitive information from being leaked and maliciously utilized. Famous privacy preserving principles that aim to ensure both data privacy and data integrity, such as kk-anonymity and ll-diversity, have been extensively studied both theoretically and empirically. Nonetheless, these widely-adopted principles are still insufficient to prevent attribute disclosure if the attacker has partial knowledge about the overall sensitive data distribution. The tt-closeness principle has been proposed to fix this, which also has the benefit of supporting numerical sensitive attributes. However, in contrast to kk-anonymity and ll-diversity, the theoretical aspect of tt-closeness has not been well investigated. We initiate the first systematic theoretical study on the tt-closeness principle under the commonly-used attribute suppression model. We prove that for every constant tt such that 0≤t<10\leq t<1, it is NP-hard to find an optimal tt-closeness generalization of a given table. The proof consists of several reductions each of which works for different values of tt, which together cover the full range. To complement this negative result, we also provide exact and fixed-parameter algorithms. Finally, we answer some open questions regarding the complexity of kk-anonymity and ll-diversity left in the literature.Comment: An extended abstract to appear in DASFAA 201

    An Agent Framework for Dynamic Health Data Aggregation for Research Purposes

    Get PDF
    This paper presents a model of a MAS framework for dynamic aggregation of population health data for research purposes. The contribution of the paper is twofold: First, it describes a MAS architecture that allows one to built on the fly anonymized databases from the distributed sources of data. Second, it shows how to improve the utility of the data with the growth of the database

    A novel privacy paradigm for improving serial data privacy

    Get PDF
    Protecting the privacy of individuals is of utmost concern in today’s society, as inscribed and governed by the prevailing privacy laws, such as GDPR. In serial data, bits of data are continuously released, but their combined effect may result in a privacy breach in the whole serial publication. Protecting serial data is crucial for preserving them from adversaries. Previous approaches provide privacy for relational data and serial data, but many loopholes exist when dealing with multiple sensitive values. We address these problems by introducing a novel privacy approach that limits the risk of privacy disclosure in republication and gives better privacy with much lower perturbation rates. Existing techniques provide a strong privacy guarantee against attacks on data privacy; however, in serial publication, the chances of attack still exist due to the continuous addition and deletion of data. In serial data, proper countermeasures for tackling attacks such as correlation attacks have not been taken, due to which serial publication is still at risk. Moreover, protecting privacy is a significant task due to the critical absence of sensitive values while dealing with multiple sensitive values. Due to this critical absence, signatures change in every release, which is a reason for attacks. In this paper, we introduce a novel approach in order to counter the composition attack and the transitive composition attack and we prove that the proposed approach is better than the existing state-of-the-art techniques. Our paper establishes the result with a systematic examination of the republication dilemma. Finally, we evaluate our work using benchmark datasets, and the results show the efficacy of the proposed technique

    Cloning for privacy protection in multiple independent data publications

    No full text
    Data anonymization has become a major technique in privacy preserving data publishing. Many methods have been proposed to anonymize one dataset and a series of datasets of a data owner. However, no method has been proposed for the anonymization of data of multiple independent data publications. A data owner publishes a dataset, which contains overlapping population with other datasets published by other independent data owners. In this paper we analyze the privacy risk in the such scenario and vulnerability of partitioned based anonymization methods. We show that no partitioned based anonymization methods can protect privacy in arbitrary data distributions, and identify a case that the privacy can be protected in the scenario. We propose a new generalization principle -cloning to protect privacy for multiple independent data publications. We also develop an effective algorithm to achieve the cloning. We experimentally show that the proposed algorithm anonymizes data to satisfy the privacy requirement and preserves good data utility
    corecore