Search CORE

9 research outputs found

Tight Lower Bounds for Differentially Private Selection

Author: Steinke Thomas
Ullman Jonathan
Publication venue
Publication date: 10/04/2017
Field of study

A pervasive task in the differential privacy literature is to select the

k

items of "highest quality" out of a set of

d

items, where the quality of each item depends on a sensitive dataset that must be protected. Variants of this task arise naturally in fundamental problems like feature selection and hypothesis testing, and also as subroutines for many sophisticated differentially private algorithms. The standard approaches to these tasks---repeated use of the exponential mechanism or the sparse vector technique---approximately solve this problem given a dataset of

n = O(\sqrt{k}\log d)

samples. We provide a tight lower bound for some very simple variants of the private selection problem. Our lower bound shows that a sample of size

n = \Omega(\sqrt{k} \log d)

is required even to achieve a very minimal accuracy guarantee. Our results are based on an extension of the fingerprinting method to sparse selection problems. Previously, the fingerprinting method has been used to provide tight lower bounds for answering an entire set of

d

queries, but often only some much smaller set of

k

queries are relevant. Our extension allows us to prove lower bounds that depend on both the number of relevant queries and the total number of queries

arXiv.org e-Print Archive

Crossref

Hardness of Non-Interactive Differential Privacy from One-Way Functions

Author: A Beimel
A Blum
A Gupta
A Gupta
A Nikolov
B Chor
B Tang
C Dwork
C Dwork
Cynthia Dwork
D Boneh
D Boneh
D Boneh
J Thaler
J Ullman
J Ullman
L Kowalczyk
L Pitt
M Bun
Moritz Hardt
S Gorbunov
S Vadhan
Y Dodis
Z Brakerski
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 31/05/2018
Field of study

A central challenge in differential privacy is to design computationally efficient non-interactive algorithms that can answer large numbers of statistical queries on a sensitive dataset. That is, we would like to design a differentially private algorithm that takes a dataset

D \in X^n

consisting of some small number of elements

n

from some large data universe

X

, and efficiently outputs a summary that allows a user to efficiently obtain an answer to any query in some large family

Q

. Ignoring computational constraints, this problem can be solved even when

X

and

Q

are exponentially large and

n

is just a small polynomial; however, all algorithms with remotely similar guarantees run in exponential time. There have been several results showing that, under the strong assumption of indistinguishability obfuscation (iO), no efficient differentially private algorithm exists when

X

and

Q

can be exponentially large. However, there are no strong separations between information-theoretic and computationally efficient differentially private algorithms under any standard complexity assumption. In this work we show that, if one-way functions exist, there is no general purpose differentially private algorithm that works when

X

and

Q

are exponentially large, and

n

is an arbitrary polynomial. In fact, we show that this result holds even if

X

is just subexponentially large (assuming only polynomially-hard one-way functions). This result solves an open problem posed by Vadhan in his recent survey

Crossref

Cryptology ePrint Archive

Minimax Optimality In High-Dimensional Classification, Clustering, And Privacy

Author: Zhang Linjun
Publication venue: ScholarlyCommons
Publication date: 01/01/2019
Field of study

The age of “Big Data” features large volume of massive and high-dimensional datasets, leading to fast emergence of different algorithms, as well as new concerns such as privacy and fairness. To compare different algorithms with (without) these new constraints, minimax decision theory provides a principled framework to quantify the optimality of algorithms and investigate the fundamental difficulty of statistical problems. Under the framework of minimax theory, this thesis aims to address the following four problems: 1. The first part of this thesis aims to develop an optimality theory for linear discriminant analysis in the high-dimensional setting. In addition, we consider classification with incomplete data under the missing completely at random (MCR) model. 2. In the second part, we study high-dimensional sparse Quadratic Discriminant Analysis (QDA) and aim to establish the optimal convergence rates. 3. In the third part, we study the optimality of high-dimensional clustering on the unsupervised setting under the Gaussian mixtures model. We propose a EM-based procedure with the optimal rate of convergence for the excess mis-clustering error. 4. In the fourth part, we investigate the minimax optimality under the privacy constraint for mean estimation and linear regression models, under both the classical low-dimensional and modern high-dimensional settings

ScholarlyCommons@Penn