1,420 research outputs found
Private Distribution Learning with Public Data: The View from Sample Compression
We study the problem of private distribution learning with access to public
data. In this setup, which we refer to as public-private learning, the learner
is given public and private samples drawn from an unknown distribution
belonging to a class , with the goal of outputting an estimate of
while adhering to privacy constraints (here, pure differential privacy)
only with respect to the private samples.
We show that the public-private learnability of a class is
connected to the existence of a sample compression scheme for , as
well as to an intermediate notion we refer to as list learning. Leveraging this
connection: (1) approximately recovers previous results on Gaussians over
; and (2) leads to new ones, including sample complexity upper
bounds for arbitrary -mixtures of Gaussians over , results for
agnostic and distribution-shift resistant learners, as well as closure
properties for public-private learnability under taking mixtures and products
of distributions. Finally, via the connection to list learning, we show that
for Gaussians in , at least public samples are necessary for
private learnability, which is close to the known upper bound of public
samples.Comment: 31 page
Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples
We study the problem of estimating mixtures of Gaussians under the constraint
of differential privacy (DP). Our main result is that samples are sufficient to estimate a
mixture of Gaussians up to total variation distance while
satisfying -DP. This is the first finite sample
complexity upper bound for the problem that does not make any structural
assumptions on the GMMs.
To solve the problem, we devise a new framework which may be useful for other
tasks. On a high level, we show that if a class of distributions (such as
Gaussians) is (1) list decodable and (2) admits a "locally small'' cover (Bun
et al., 2021) with respect to total variation distance, then the class of its
mixtures is privately learnable. The proof circumvents a known barrier
indicating that, unlike Gaussians, GMMs do not admit a locally small cover
(Aden-Ali et al., 2021b)
Image Super-Resolution as a Defense Against Adversarial Attacks
Convolutional Neural Networks have achieved significant success across
multiple computer vision tasks. However, they are vulnerable to carefully
crafted, human-imperceptible adversarial noise patterns which constrain their
deployment in critical security-sensitive systems. This paper proposes a
computationally efficient image enhancement approach that provides a strong
defense mechanism to effectively mitigate the effect of such adversarial
perturbations. We show that deep image restoration networks learn mapping
functions that can bring off-the-manifold adversarial samples onto the natural
image manifold, thus restoring classification towards correct classes. A
distinguishing feature of our approach is that, in addition to providing
robustness against attacks, it simultaneously enhances image quality and
retains models performance on clean images. Furthermore, the proposed method
does not modify the classifier or requires a separate mechanism to detect
adversarial images. The effectiveness of the scheme has been demonstrated
through extensive experiments, where it has proven a strong defense in gray-box
settings. The proposed scheme is simple and has the following advantages: (1)
it does not require any model training or parameter optimization, (2) it
complements other existing defense mechanisms, (3) it is agnostic to the
attacked model and attack type and (4) it provides superior performance across
all popular attack algorithms. Our codes are publicly available at
https://github.com/aamir-mustafa/super-resolution-adversarial-defense.Comment: Published in IEEE Transactions in Image Processin
A PAC-Theory of Clustering with Advice
In the absence of domain knowledge, clustering is usually an under-specified task. For any clustering application, one can choose among a variety of different clustering algorithms, along with different preprocessing techniques, that are likely to result in dramatically different answers. Any of these solutions, however, can be acceptable depending on the application, and therefore, it is critical to incorporate prior knowledge about the data and the intended semantics of clustering into the process of clustering model selection.
One scenario that we study is when the user (i.e., the domain expert) provides a clustering of a (relatively small) random subset of the data set. The clustering algorithm then uses this kind of ``advice'' to come up with a data representation under which an application of a fixed clustering algorithm (e.g., k-means) results in a partition of the full data set that is aligned with the user's knowledge. We provide ``advice complexity'' of learning a representation in this paradigm.
Another form of ``advice'' can be obtained by allowing the clustering algorithm to interact with a domain expert by asking same-cluster queries: ``Do these two instances belong to the same cluster?''. The goal of the clustering algorithm will then be finding a partition of the data set that is consistent with the domain expert's knowledge (yet using only a small number of queries). Aside from studying the ``advice complexity'' (i.e., query complexity) of learning in this model, we investigate the trade-offs between computational and advice complexities of learning, showing that using a little bit of advice can turn an otherwise computationally hard clustering problem into a tractable one.
In the second part of this dissertation we study the problem of learning mixture models, where we are given an i.i.d. sample generated from an unknown target from a family of mixture distributions, and want to output a distribution that is close to the target in total variation distance. In particular, given a sample-efficient learner for a base class of distributions (e.g., Gaussians), we show how one can come up with a sample-efficient method for learning mixtures of the base class (e.g., mixtures of k Gaussians). As a byproduct of this analysis, we are able to prove tighter sample complexity bounds for learning various mixture models. We also investigate how having access to the same-cluster queries (i.e., whether two instances were generated from the same mixture component) can help reducing the computational burden of learning within this model.
Finally, we take a further step and introduce a novel method for distribution learning via a form of compression. In particular, we ask whether one can compress a large-enough sample set generated from a target distribution (by picking only a few instances from it) in a way that allows recovery of (an approximation to) the target distribution. We prove that if this is the case for all members of a class of distributions, then there is a sample-efficient way of distribution learning with respect to this class. As an application of this novel notion, we settle the sample complexity of learning mixtures of k axis-aligned Gaussian distributions (within logarithmic factors)
Optimal PAC Bounds Without Uniform Convergence
In statistical learning theory, determining the sample complexity of
realizable binary classification for VC classes was a long-standing open
problem. The results of Simon and Hanneke established sharp upper bounds in
this setting. However, the reliance of their argument on the uniform
convergence principle limits its applicability to more general learning
settings such as multiclass classification. In this paper, we address this
issue by providing optimal high probability risk bounds through a framework
that surpasses the limitations of uniform convergence arguments.
Our framework converts the leave-one-out error of permutation invariant
predictors into high probability risk bounds. As an application, by adapting
the one-inclusion graph algorithm of Haussler, Littlestone, and Warmuth, we
propose an algorithm that achieves an optimal PAC bound for binary
classification. Specifically, our result shows that certain aggregations of
one-inclusion graph algorithms are optimal, addressing a variant of a classic
question posed by Warmuth.
We further instantiate our framework in three settings where uniform
convergence is provably suboptimal. For multiclass classification, we prove an
optimal risk bound that scales with the one-inclusion hypergraph density of the
class, addressing the suboptimality of the analysis of Daniely and
Shalev-Shwartz. For partial hypothesis classification, we determine the optimal
sample complexity bound, resolving a question posed by Alon, Hanneke, Holzman,
and Moran. For realizable bounded regression with absolute loss, we derive an
optimal risk bound that relies on a modified version of the scale-sensitive
dimension, refining the results of Bartlett and Long. Our rates surpass
standard uniform convergence-based results due to the smaller complexity
measure in our risk bound.Comment: 27 page
- …