Search CORE

601 research outputs found

On the Complexity of $t$ -Closeness Anonymization and Related Problems

Author: D. Rebollo-Monedero
E. Anshelevich
J. Blocki
J. Cao
L. Sweeney
N. Li
P. Bonizzoni
P. Samarati
P.A. Evans
R. Bredereck
Y. Rubner
Publication venue
Publication date: 01/01/2013
Field of study

An important issue in releasing individual data is to protect the sensitive information from being leaked and maliciously utilized. Famous privacy preserving principles that aim to ensure both data privacy and data integrity, such as

k

-anonymity and

l

-diversity, have been extensively studied both theoretically and empirically. Nonetheless, these widely-adopted principles are still insufficient to prevent attribute disclosure if the attacker has partial knowledge about the overall sensitive data distribution. The

t

-closeness principle has been proposed to fix this, which also has the benefit of supporting numerical sensitive attributes. However, in contrast to

k

-anonymity and

l

-diversity, the theoretical aspect of

t

-closeness has not been well investigated. We initiate the first systematic theoretical study on the

t

-closeness principle under the commonly-used attribute suppression model. We prove that for every constant

t

such that

0\leq t<1

, it is NP-hard to find an optimal

t

-closeness generalization of a given table. The proof consists of several reductions each of which works for different values of

t

, which together cover the full range. To complement this negative result, we also provide exact and fixed-parameter algorithms. Finally, we answer some open questions regarding the complexity of

k

-anonymity and

l

-diversity left in the literature.Comment: An extended abstract to appear in DASFAA 201

arXiv.org e-Print Archive

Crossref

Efficient Algorithms for Privately Releasing Marginals via Convex Relaxations

Author: Dwork Cynthia
Nikolov Aleksandar
Talwar Kunal
Publication venue
Publication date: 06/08/2013
Field of study

Consider a database of

n

people, each represented by a bit-string of length

d

corresponding to the setting of

d

binary attributes. A

k

-way marginal query is specified by a subset

S

k

attributes, and a

|S|

-dimensional binary vector

\beta

specifying their values. The result for this query is a count of the number of people in the database whose attribute vector restricted to

S

agrees with

\beta

. Privately releasing approximate answers to a set of

k

-way marginal queries is one of the most important and well-motivated problems in differential privacy. Information theoretically, the error complexity of marginal queries is well-understood: the per-query additive error is known to be at least

\Omega(\min\{\sqrt{n},d^{\frac{k}{2}}\})

and at most

\tilde{O}(\min\{\sqrt{n} d^{1/4},d^{\frac{k}{2}}\})

. However, no polynomial time algorithm with error complexity as low as the information theoretic upper bound is known for small

n

. In this work we present a polynomial time algorithm that, for any distribution on marginal queries, achieves average error at most

\tilde{O}(\sqrt{n} d^{\frac{\lceil k/2 \rceil}{4}})

. This error bound is as good as the best known information theoretic upper bounds for

k=2

. This bound is an improvement over previous work on efficiently releasing marginals when

k

is small and when error

o(n)

is desirable. Using private boosting we are also able to give nearly matching worst-case error bounds. Our algorithms are based on the geometric techniques of Nikolov, Talwar, and Zhang. The main new ingredients are convex relaxations and careful use of the Frank-Wolfe algorithm for constrained convex minimization. To design our relaxations, we rely on the Grothendieck inequality from functional analysis

arXiv.org e-Print Archive

CiteSeerX

Differentially Private Release and Learning of Threshold Functions

Author: Bun Mark
Nissim Kobbi
Stemmer Uri
Vadhan Salil
Publication venue
Publication date: 28/04/2015
Field of study

We prove new upper and lower bounds on the sample complexity of

(\epsilon, \delta)

differentially private algorithms for releasing approximate answers to threshold functions. A threshold function

c_x

over a totally ordered domain

X

evaluates to

c_x(y) = 1

y \le x

, and evaluates to

0

otherwise. We give the first nontrivial lower bound for releasing thresholds with

(\epsilon,\delta)

differential privacy, showing that the task is impossible over an infinite domain

X

, and moreover requires sample complexity

n \ge \Omega(\log^*|X|)

, which grows with the size of the domain. Inspired by the techniques used to prove this lower bound, we give an algorithm for releasing thresholds with

n \le 2^{(1+ o(1))\log^*|X|}

samples. This improves the previous best upper bound of

8^{(1 + o(1))\log^*|X|}

(Beimel et al., RANDOM '13). Our sample complexity upper and lower bounds also apply to the tasks of learning distributions with respect to Kolmogorov distance and of properly PAC learning thresholds with differential privacy. The lower bound gives the first separation between the sample complexity of properly learning a concept class with

(\epsilon,\delta)

differential privacy and learning without privacy. For properly learning thresholds in

\ell

dimensions, this lower bound extends to

n \ge \Omega(\ell \cdot \log^*|X|)

. To obtain our results, we give reductions in both directions from releasing and properly learning thresholds and the simpler interior point problem. Given a database

D

of elements from

X

, the interior point problem asks for an element between the smallest and largest elements in

D

. We introduce new recursive constructions for bounding the sample complexity of the interior point problem, as well as further reductions and techniques for proving impossibility results for other basic problems in differential privacy.Comment: 43 page

arXiv.org e-Print Archive

Crossref

Randomized Rounding for the Largest Simplex Problem

Author: Ghouila-Houri A.
Goreinov S. A.
John F.
Lewis A. S.
Schur I.
Srinivasan A.
Publication venue
Publication date: 14/04/2015
Field of study

The maximum volume

j

-simplex problem asks to compute the

j

-dimensional simplex of maximum volume inside the convex hull of a given set of

n

points in

\mathbb{Q}^d

. We give a deterministic approximation algorithm for this problem which achieves an approximation ratio of

e^{j/2 + o(j)}

. The problem is known to be

\mathrm{NP}

-hard to approximate within a factor of

c^{j}

for some constant

c > 1

. Our algorithm also gives a factor

e^{j + o(j)}

approximation for the problem of finding the principal

j\times j

submatrix of a rank

d

positive semidefinite matrix with the largest determinant. We achieve our approximation by rounding solutions to a generalization of the

D

-optimal design problem, or, equivalently, the dual of an appropriate smallest enclosing ellipsoid problem. Our arguments give a short and simple proof of a restricted invertibility principle for determinants

arXiv.org e-Print Archive

Crossref

Reverse-Safe Data Structures for Text Indexing

Author: Gabriele Fici
Giulia Bernardini
Grigorios Loukides
Huiping Chen
Solon P. Pissis
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2020
Field of study

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optimally, where d is maximal for any such z-reverse-safe data structure. The construction algorithm takes O(n ω log d) time, where ω is the matrix multiplication exponent. We show that, despite the n ω factor, our engineered implementation takes only a few minutes to finish for million-letter texts. We further show that plugging our method in data analysis applications gives insignificant or no data utility loss. Finally, we show how our technique can be extended to support applications under a realistic adversary model

Archivio istituzionale della ricerca - Università di Trieste

Crossref

CWI's Institutional Repository

University of Birmingham Research Portal

Archivio istituzionale della ricerca - Università di Palermo

Constant-Round Privacy Preserving Multiset Union

Author: Jeongdae Hong
Jihye Kim
Jung Hee Cheon
Jung Woo Kim
Kunsoo Park
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 21/03/2011
Field of study

Privacy preserving multiset union (PPMU) protocol allows a set of parties, each with a multiset, to collaboratively compute a multiset union secretly, meaning that any information other than union is not revealed. We propose efficient PPMU protocols, using multiplicative homomorphic cryptosystem. The novelty of our protocol is to directly encrypt a polynomial by representing it by an element of an extension field. The resulting protocols consist of constant rounds and improve communication cost. We also prove the security of our protocol against malicious adversaries, in the random oracle model

Cryptology ePrint Archive