79,467 research outputs found
On the Complexity of -Closeness Anonymization and Related Problems
An important issue in releasing individual data is to protect the sensitive
information from being leaked and maliciously utilized. Famous privacy
preserving principles that aim to ensure both data privacy and data integrity,
such as -anonymity and -diversity, have been extensively studied both
theoretically and empirically. Nonetheless, these widely-adopted principles are
still insufficient to prevent attribute disclosure if the attacker has partial
knowledge about the overall sensitive data distribution. The -closeness
principle has been proposed to fix this, which also has the benefit of
supporting numerical sensitive attributes. However, in contrast to
-anonymity and -diversity, the theoretical aspect of -closeness has
not been well investigated.
We initiate the first systematic theoretical study on the -closeness
principle under the commonly-used attribute suppression model. We prove that
for every constant such that , it is NP-hard to find an optimal
-closeness generalization of a given table. The proof consists of several
reductions each of which works for different values of , which together
cover the full range. To complement this negative result, we also provide exact
and fixed-parameter algorithms. Finally, we answer some open questions
regarding the complexity of -anonymity and -diversity left in the
literature.Comment: An extended abstract to appear in DASFAA 201
Privacy Preservation by Disassociation
In this work, we focus on protection against identity disclosure in the
publication of sparse multidimensional data. Existing multidimensional
anonymization techniquesa) protect the privacy of users either by altering the
set of quasi-identifiers of the original data (e.g., by generalization or
suppression) or by adding noise (e.g., using differential privacy) and/or (b)
assume a clear distinction between sensitive and non-sensitive information and
sever the possible linkage. In many real world applications the above
techniques are not applicable. For instance, consider web search query logs.
Suppressing or generalizing anonymization methods would remove the most
valuable information in the dataset: the original query terms. Additionally,
web search query logs contain millions of query terms which cannot be
categorized as sensitive or non-sensitive since a term may be sensitive for a
user and non-sensitive for another. Motivated by this observation, we propose
an anonymization technique termed disassociation that preserves the original
terms but hides the fact that two or more different terms appear in the same
record. We protect the users' privacy by disassociating record terms that
participate in identifying combinations. This way the adversary cannot
associate with high probability a record with a rare combination of terms. To
the best of our knowledge, our proposal is the first to employ such a technique
to provide protection against identity disclosure. We propose an anonymization
algorithm based on our approach and evaluate its performance on real and
synthetic datasets, comparing it against other state-of-the-art methods based
on generalization and differential privacy.Comment: VLDB201
Exploring personalized life cycle policies
Ambient Intelligence imposes many challenges in protecting people's privacy. Storing privacy-sensitive data permanently will inevitably result in privacy violations. Limited retention techniques might prove useful in order to limit the risks of unwanted and irreversible disclosure of privacy-sensitive data. To overcome the rigidness of simple limited retention policies, Life-Cycle policies more precisely describe when and how data could be first degraded and finally be destroyed. This allows users themselves to determine an adequate compromise between privacy and data retention. However, implementing and enforcing these policies is a difficult problem. Traditional databases are not designed or optimized for deleting data. In this report, we recall the formerly introduced life cycle policy model and the already developed techniques for handling a single collective policy for all data in a relational database management system. We identify the problems raised by loosening this single policy constraint and propose preliminary techniques for concurrently handling multiple policies in one data store. The main technical consequence for the storage structure is, that when allowing multiple policies, the degradation order of tuples will not always be equal to the insert order anymore. Apart from the technical aspects, we show that personalizing the policies introduces some inference breaches which have to be further investigated. To make such an investigation possible, we introduce a metric for privacy, which enables the possibility to compare the provided amount of privacy with the amount of privacy required by the policy
- …