Search CORE

79,467 research outputs found

On the Complexity of $t$ -Closeness Anonymization and Related Problems

Author: D. Rebollo-Monedero
E. Anshelevich
J. Blocki
J. Cao
L. Sweeney
N. Li
P. Bonizzoni
P. Samarati
P.A. Evans
R. Bredereck
Y. Rubner
Publication venue
Publication date: 01/01/2013
Field of study

An important issue in releasing individual data is to protect the sensitive information from being leaked and maliciously utilized. Famous privacy preserving principles that aim to ensure both data privacy and data integrity, such as

k

-anonymity and

l

-diversity, have been extensively studied both theoretically and empirically. Nonetheless, these widely-adopted principles are still insufficient to prevent attribute disclosure if the attacker has partial knowledge about the overall sensitive data distribution. The

t

-closeness principle has been proposed to fix this, which also has the benefit of supporting numerical sensitive attributes. However, in contrast to

k

-anonymity and

l

-diversity, the theoretical aspect of

t

-closeness has not been well investigated. We initiate the first systematic theoretical study on the

t

-closeness principle under the commonly-used attribute suppression model. We prove that for every constant

t

such that

0\leq t<1

, it is NP-hard to find an optimal

t

-closeness generalization of a given table. The proof consists of several reductions each of which works for different values of

t

, which together cover the full range. To complement this negative result, we also provide exact and fixed-parameter algorithms. Finally, we answer some open questions regarding the complexity of

k

-anonymity and

l

-diversity left in the literature.Comment: An extended abstract to appear in DASFAA 201

arXiv.org e-Print Archive

Crossref

Privacy Preservation by Disassociation

Author: Liagouris John
Mamoulis Nikos
Skiadopoulos Spiros
Terrovitis Manolis
Publication venue
Publication date: 01/01/2012
Field of study

In this work, we focus on protection against identity disclosure in the publication of sparse multidimensional data. Existing multidimensional anonymization techniquesa) protect the privacy of users either by altering the set of quasi-identifiers of the original data (e.g., by generalization or suppression) or by adding noise (e.g., using differential privacy) and/or (b) assume a clear distinction between sensitive and non-sensitive information and sever the possible linkage. In many real world applications the above techniques are not applicable. For instance, consider web search query logs. Suppressing or generalizing anonymization methods would remove the most valuable information in the dataset: the original query terms. Additionally, web search query logs contain millions of query terms which cannot be categorized as sensitive or non-sensitive since a term may be sensitive for a user and non-sensitive for another. Motivated by this observation, we propose an anonymization technique termed disassociation that preserves the original terms but hides the fact that two or more different terms appear in the same record. We protect the users' privacy by disassociating record terms that participate in identifying combinations. This way the adversary cannot associate with high probability a record with a rare combination of terms. To the best of our knowledge, our proposal is the first to employ such a technique to provide protection against identity disclosure. We propose an anonymization algorithm based on our approach and evaluate its performance on real and synthetic datasets, comparing it against other state-of-the-art methods based on generalization and differential privacy.Comment: VLDB201

arXiv.org e-Print Archive

CiteSeerX

HKU Scholars Hub

Exploring personalized life cycle policies

Author: Anciaux N.L.G.
Apers P.M.G.
Fokkinga M.M.
Heerde H.J.W van
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2007
Field of study

Ambient Intelligence imposes many challenges in protecting people's privacy. Storing privacy-sensitive data permanently will inevitably result in privacy violations. Limited retention techniques might prove useful in order to limit the risks of unwanted and irreversible disclosure of privacy-sensitive data. To overcome the rigidness of simple limited retention policies, Life-Cycle policies more precisely describe when and how data could be first degraded and finally be destroyed. This allows users themselves to determine an adequate compromise between privacy and data retention. However, implementing and enforcing these policies is a difficult problem. Traditional databases are not designed or optimized for deleting data. In this report, we recall the formerly introduced life cycle policy model and the already developed techniques for handling a single collective policy for all data in a relational database management system. We identify the problems raised by loosening this single policy constraint and propose preliminary techniques for concurrently handling multiple policies in one data store. The main technical consequence for the storage structure is, that when allowing multiple policies, the degradation order of tuples will not always be equal to the insert order anymore. Apart from the technical aspects, we show that personalizing the policies introduces some inference breaches which have to be further investigated. To make such an investigation possible, we introduce a metric for privacy, which enables the possibility to compare the provided amount of privacy with the amount of privacy required by the policy

CiteSeerX

University of Twente Research Information