9 research outputs found
Tight Lower Bounds for Differentially Private Selection
A pervasive task in the differential privacy literature is to select the
items of "highest quality" out of a set of items, where the quality of each
item depends on a sensitive dataset that must be protected. Variants of this
task arise naturally in fundamental problems like feature selection and
hypothesis testing, and also as subroutines for many sophisticated
differentially private algorithms.
The standard approaches to these tasks---repeated use of the exponential
mechanism or the sparse vector technique---approximately solve this problem
given a dataset of samples. We provide a tight lower
bound for some very simple variants of the private selection problem. Our lower
bound shows that a sample of size is required
even to achieve a very minimal accuracy guarantee.
Our results are based on an extension of the fingerprinting method to sparse
selection problems. Previously, the fingerprinting method has been used to
provide tight lower bounds for answering an entire set of queries, but
often only some much smaller set of queries are relevant. Our extension
allows us to prove lower bounds that depend on both the number of relevant
queries and the total number of queries
Hardness of Non-Interactive Differential Privacy from One-Way Functions
A central challenge in differential privacy is to design computationally efficient non-interactive algorithms that can answer large numbers of statistical queries on a sensitive dataset. That is, we would like to design a differentially private algorithm that takes a dataset consisting of some small number of elements from some large data universe , and efficiently outputs a summary that allows a user to efficiently obtain an answer to any query in some large family .
Ignoring computational constraints, this problem can be solved even when and are exponentially large and is just a small polynomial; however, all algorithms with remotely similar guarantees run in exponential time. There have been several results showing that, under the strong assumption of indistinguishability obfuscation (iO), no efficient differentially private algorithm exists when and can be exponentially large. However, there are no strong separations between information-theoretic and computationally efficient differentially private algorithms under any standard complexity assumption.
In this work we show that, if one-way functions exist, there is no general purpose differentially private algorithm that works when and are exponentially large, and is an arbitrary polynomial. In fact, we show that this result holds even if is just subexponentially large (assuming only polynomially-hard one-way functions). This result solves an open problem posed by Vadhan in his recent survey
Minimax Optimality In High-Dimensional Classification, Clustering, And Privacy
The age of “Big Data” features large volume of massive and high-dimensional datasets, leading to fast emergence of different algorithms, as well as new concerns such as privacy and fairness. To compare different algorithms with (without) these new constraints, minimax decision theory provides a principled framework to quantify the optimality of algorithms and investigate the fundamental difficulty of statistical problems. Under the framework of minimax theory, this thesis aims to address the following four problems:
1. The first part of this thesis aims to develop an optimality theory for linear discriminant analysis in the high-dimensional setting. In addition, we consider classification with incomplete data under the missing completely at random (MCR) model.
2. In the second part, we study high-dimensional sparse Quadratic Discriminant Analysis (QDA) and aim to establish the optimal convergence rates.
3. In the third part, we study the optimality of high-dimensional clustering on the unsupervised setting under the Gaussian mixtures model. We propose a EM-based procedure with the optimal rate of convergence for the excess mis-clustering error.
4. In the fourth part, we investigate the minimax optimality under the privacy constraint for mean estimation and linear regression models, under both the classical low-dimensional and modern high-dimensional settings