601 research outputs found
On the Complexity of -Closeness Anonymization and Related Problems
An important issue in releasing individual data is to protect the sensitive
information from being leaked and maliciously utilized. Famous privacy
preserving principles that aim to ensure both data privacy and data integrity,
such as -anonymity and -diversity, have been extensively studied both
theoretically and empirically. Nonetheless, these widely-adopted principles are
still insufficient to prevent attribute disclosure if the attacker has partial
knowledge about the overall sensitive data distribution. The -closeness
principle has been proposed to fix this, which also has the benefit of
supporting numerical sensitive attributes. However, in contrast to
-anonymity and -diversity, the theoretical aspect of -closeness has
not been well investigated.
We initiate the first systematic theoretical study on the -closeness
principle under the commonly-used attribute suppression model. We prove that
for every constant such that , it is NP-hard to find an optimal
-closeness generalization of a given table. The proof consists of several
reductions each of which works for different values of , which together
cover the full range. To complement this negative result, we also provide exact
and fixed-parameter algorithms. Finally, we answer some open questions
regarding the complexity of -anonymity and -diversity left in the
literature.Comment: An extended abstract to appear in DASFAA 201
Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations
Consider a database of people, each represented by a bit-string of length
corresponding to the setting of binary attributes. A -way marginal
query is specified by a subset of attributes, and a -dimensional
binary vector specifying their values. The result for this query is a
count of the number of people in the database whose attribute vector restricted
to agrees with .
Privately releasing approximate answers to a set of -way marginal queries
is one of the most important and well-motivated problems in differential
privacy. Information theoretically, the error complexity of marginal queries is
well-understood: the per-query additive error is known to be at least
and at most
. However, no polynomial
time algorithm with error complexity as low as the information theoretic upper
bound is known for small . In this work we present a polynomial time
algorithm that, for any distribution on marginal queries, achieves average
error at most . This error
bound is as good as the best known information theoretic upper bounds for
. This bound is an improvement over previous work on efficiently releasing
marginals when is small and when error is desirable. Using private
boosting we are also able to give nearly matching worst-case error bounds.
Our algorithms are based on the geometric techniques of Nikolov, Talwar, and
Zhang. The main new ingredients are convex relaxations and careful use of the
Frank-Wolfe algorithm for constrained convex minimization. To design our
relaxations, we rely on the Grothendieck inequality from functional analysis
Differentially Private Release and Learning of Threshold Functions
We prove new upper and lower bounds on the sample complexity of differentially private algorithms for releasing approximate answers to
threshold functions. A threshold function over a totally ordered domain
evaluates to if , and evaluates to otherwise. We
give the first nontrivial lower bound for releasing thresholds with
differential privacy, showing that the task is impossible
over an infinite domain , and moreover requires sample complexity , which grows with the size of the domain. Inspired by the
techniques used to prove this lower bound, we give an algorithm for releasing
thresholds with samples. This improves the
previous best upper bound of (Beimel et al., RANDOM
'13).
Our sample complexity upper and lower bounds also apply to the tasks of
learning distributions with respect to Kolmogorov distance and of properly PAC
learning thresholds with differential privacy. The lower bound gives the first
separation between the sample complexity of properly learning a concept class
with differential privacy and learning without privacy. For
properly learning thresholds in dimensions, this lower bound extends to
.
To obtain our results, we give reductions in both directions from releasing
and properly learning thresholds and the simpler interior point problem. Given
a database of elements from , the interior point problem asks for an
element between the smallest and largest elements in . We introduce new
recursive constructions for bounding the sample complexity of the interior
point problem, as well as further reductions and techniques for proving
impossibility results for other basic problems in differential privacy.Comment: 43 page
Randomized Rounding for the Largest Simplex Problem
The maximum volume -simplex problem asks to compute the -dimensional
simplex of maximum volume inside the convex hull of a given set of points
in . We give a deterministic approximation algorithm for this
problem which achieves an approximation ratio of . The problem
is known to be -hard to approximate within a factor of for
some constant . Our algorithm also gives a factor
approximation for the problem of finding the principal submatrix of
a rank positive semidefinite matrix with the largest determinant. We
achieve our approximation by rounding solutions to a generalization of the
-optimal design problem, or, equivalently, the dual of an appropriate
smallest enclosing ellipsoid problem. Our arguments give a short and simple
proof of a restricted invertibility principle for determinants
Reverse-Safe Data Structures for Text Indexing
We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model
Constant-Round Privacy Preserving Multiset Union
Privacy preserving multiset union (PPMU) protocol allows a set of parties, each with a multiset, to collaboratively compute a multiset union secretly, meaning that any information other than union is not revealed. We propose efficient PPMU protocols, using multiplicative homomorphic cryptosystem. The novelty of our protocol is to directly encrypt a polynomial by representing it by an element of an extension field. The resulting protocols consist of constant rounds and improve communication cost. We also prove the security of our protocol against malicious adversaries, in the random oracle model
- …