3 research outputs found
On the Complexity of -Closeness Anonymization and Related Problems
An important issue in releasing individual data is to protect the sensitive
information from being leaked and maliciously utilized. Famous privacy
preserving principles that aim to ensure both data privacy and data integrity,
such as -anonymity and -diversity, have been extensively studied both
theoretically and empirically. Nonetheless, these widely-adopted principles are
still insufficient to prevent attribute disclosure if the attacker has partial
knowledge about the overall sensitive data distribution. The -closeness
principle has been proposed to fix this, which also has the benefit of
supporting numerical sensitive attributes. However, in contrast to
-anonymity and -diversity, the theoretical aspect of -closeness has
not been well investigated.
We initiate the first systematic theoretical study on the -closeness
principle under the commonly-used attribute suppression model. We prove that
for every constant such that , it is NP-hard to find an optimal
-closeness generalization of a given table. The proof consists of several
reductions each of which works for different values of , which together
cover the full range. To complement this negative result, we also provide exact
and fixed-parameter algorithms. Finally, we answer some open questions
regarding the complexity of -anonymity and -diversity left in the
literature.Comment: An extended abstract to appear in DASFAA 201
Parameterized Complexity of the k-anonymity Problem
The problem of publishing personal data without giving up privacy is becoming
increasingly important. An interesting formalization that has been recently
proposed is the -anonymity. This approach requires that the rows of a table
are partitioned in clusters of size at least and that all the rows in a
cluster become the same tuple, after the suppression of some entries. The
natural optimization problem, where the goal is to minimize the number of
suppressed entries, is known to be APX-hard even when the records values are
over a binary alphabet and , and when the records have length at most 8
and . In this paper we study how the complexity of the problem is
influenced by different parameters. In this paper we follow this direction of
research, first showing that the problem is W[1]-hard when parameterized by the
size of the solution (and the value ). Then we exhibit a fixed parameter
algorithm, when the problem is parameterized by the size of the alphabet and
the number of columns. Finally, we investigate the computational (and
approximation) complexity of the -anonymity problem, when restricting the
instance to records having length bounded by 3 and . We show that such a
restriction is APX-hard.Comment: 22 pages, 2 figure