An important issue in releasing individual data is to protect the sensitive
information from being leaked and maliciously utilized. Famous privacy
preserving principles that aim to ensure both data privacy and data integrity,
such as k-anonymity and l-diversity, have been extensively studied both
theoretically and empirically. Nonetheless, these widely-adopted principles are
still insufficient to prevent attribute disclosure if the attacker has partial
knowledge about the overall sensitive data distribution. The t-closeness
principle has been proposed to fix this, which also has the benefit of
supporting numerical sensitive attributes. However, in contrast to
k-anonymity and l-diversity, the theoretical aspect of t-closeness has
not been well investigated.
We initiate the first systematic theoretical study on the t-closeness
principle under the commonly-used attribute suppression model. We prove that
for every constant t such that 0≤t<1, it is NP-hard to find an optimal
t-closeness generalization of a given table. The proof consists of several
reductions each of which works for different values of t, which together
cover the full range. To complement this negative result, we also provide exact
and fixed-parameter algorithms. Finally, we answer some open questions
regarding the complexity of k-anonymity and l-diversity left in the
literature.Comment: An extended abstract to appear in DASFAA 201