We formally study two methods for data sanitation that have been used
extensively in the database community: k-anonymity and l-diversity. We settle
several open problems concerning the difficulty of applying these methods
optimally, proving both positive and negative results:
1. 2-anonymity is in P.
2. The problem of partitioning the edges of a triangle-free graph into
4-stars (degree-three vertices) is NP-hard. This yields an alternative proof
that 3-anonymity is NP-hard even when the database attributes are all binary.
3. 3-anonymity with only 27 attributes per record is MAX SNP-hard.
4. For databases with n rows, k-anonymity is in O(4^n poly(n)) time for all k
> 1.
5. For databases with n rows and l <= log_{2c+2} log n attributes over an
alphabet of cardinality c = O(1), k-anonymity is in P. Assuming c, l = O(1),
k-anonymity is in O(n).
6. 3-diversity with binary attributes is NP-hard, with one sensitive
attribute.
7. 2-diversity with binary attributes is NP-hard, with three sensitive
attributes.Comment: Full Versio