4,362 research outputs found
A Compression Technique for Analyzing Disagreement-Based Active Learning
We introduce a new and improved characterization of the label complexity of
disagreement-based active learning, in which the leading quantity is the
version space compression set size. This quantity is defined as the size of the
smallest subset of the training data that induces the same version space. We
show various applications of the new characterization, including a tight
analysis of CAL and refined label complexity bounds for linear separators under
mixtures of Gaussians and axis-aligned rectangles under product densities. The
version space compression set size, as well as the new characterization of the
label complexity, can be naturally extended to agnostic learning problems, for
which we show new speedup results for two well known active learning
algorithms
A New Lower Bound for Agnostic Learning with Sample Compression Schemes
We establish a tight characterization of the worst-case rates for the excess
risk of agnostic learning with sample compression schemes and for uniform
convergence for agnostic sample compression schemes. In particular, we find
that the optimal rates of convergence for size- agnostic sample compression
schemes are of the form , which contrasts with
agnostic learning with classes of VC dimension , where the optimal rates are
of the form
Near-optimal Sample Complexity Bounds for Robust Learning of Gaussians Mixtures via Compression Schemes
We prove that samples are necessary
and sufficient for learning a mixture of Gaussians in , up to
error in total variation distance. This improves both the known
upper bounds and lower bounds for this problem. For mixtures of axis-aligned
Gaussians, we show that samples suffice,
matching a known lower bound. Moreover, these results hold in the
agnostic-learning/robust-estimation setting as well, where the target
distribution is only approximately a mixture of Gaussians.
The upper bound is shown using a novel technique for distribution learning
based on a notion of `compression.' Any class of distributions that allows such
a compression scheme can also be learned with few samples. Moreover, if a class
of distributions has such a compression scheme, then so do the classes of
products and mixtures of those distributions. The core of our main result is
showing that the class of Gaussians in admits a small-sized
compression scheme.Comment: To appear in Journal of the ACM. 46 pages. An extended abstract
appeared in NeurIPS 2018. This version contains all the proofs, generalizes
the results to agnostic learning, and improves the bounds by logarithmic
factor
VC Classes are Adversarially Robustly Learnable, but Only Improperly
We study the question of learning an adversarially robust predictor. We show
that any hypothesis class with finite VC dimension is robustly
PAC learnable with an improper learning rule. The requirement of being improper
is necessary as we exhibit examples of hypothesis classes with
finite VC dimension that are not robustly PAC learnable with any proper
learning rule.Comment: COLT 2019 Camera Read
What Can We Learn Privately?
Learning problems form an important category of computational tasks that
generalizes many of the computations researchers apply to large real-life data
sets. We ask: what concept classes can be learned privately, namely, by an
algorithm whose output does not depend too heavily on any one input or specific
training example? More precisely, we investigate learning algorithms that
satisfy differential privacy, a notion that provides strong confidentiality
guarantees in contexts where aggregate information is released about a database
containing sensitive information about individuals. We demonstrate that,
ignoring computational constraints, it is possible to privately agnostically
learn any concept class using a sample size approximately logarithmic in the
cardinality of the concept class. Therefore, almost anything learnable is
learnable privately: specifically, if a concept class is learnable by a
(non-private) algorithm with polynomial sample complexity and output size, then
it can be learned privately using a polynomial number of samples. We also
present a computationally efficient private PAC learner for the class of parity
functions. Local (or randomized response) algorithms are a practical class of
private algorithms that have received extensive investigation. We provide a
precise characterization of local private learning algorithms. We show that a
concept class is learnable by a local algorithm if and only if it is learnable
in the statistical query (SQ) model. Finally, we present a separation between
the power of interactive and noninteractive local learning algorithms.Comment: 35 pages, 2 figure
Active Nearest-Neighbor Learning in Metric Spaces
We propose a pool-based non-parametric active learning algorithm for general
metric spaces, called MArgin Regularized Metric Active Nearest Neighbor
(MARMANN), which outputs a nearest-neighbor classifier. We give prediction
error guarantees that depend on the noisy-margin properties of the input
sample, and are competitive with those obtained by previously proposed passive
learners. We prove that the label complexity of MARMANN is significantly lower
than that of any passive learner with similar error guarantees. MARMANN is
based on a generalized sample compression scheme, and a new label-efficient
active model-selection procedure
Feature Distillation: DNN-Oriented JPEG Compression Against Adversarial Examples
Image compression-based approaches for defending against the
adversarial-example attacks, which threaten the safety use of deep neural
networks (DNN), have been investigated recently. However, prior works mainly
rely on directly tuning parameters like compression rate, to blindly reduce
image features, thereby lacking guarantee on both defense efficiency (i.e.
accuracy of polluted images) and classification accuracy of benign images,
after applying defense methods. To overcome these limitations, we propose a
JPEG-based defensive compression framework, namely "feature distillation", to
effectively rectify adversarial examples without impacting classification
accuracy on benign data. Our framework significantly escalates the defense
efficiency with marginal accuracy reduction using a two-step method: First, we
maximize malicious features filtering of adversarial input perturbations by
developing defensive quantization in frequency domain of JPEG compression or
decompression, guided by a semi-analytical method; Second, we suppress the
distortions of benign features to restore classification accuracy through a
DNN-oriented quantization refine process. Our experimental results show that
proposed "feature distillation" can significantly surpass the latest
input-transformation based mitigations such as Quilting and TV Minimization in
three aspects, including defense efficiency (improve classification accuracy
from to on adversarial examples), accuracy of benign
images after defense ( accuracy degradation), and processing time per
image ( Speedup). Moreover, our solution can also provide the
best defense efficiency ( accuracy) against the recent adaptive
attack with least accuracy reduction () on benign images when compared
with other input-transformation based defense methods.Comment: 2019 Conference on Computer Vision and Pattern Recognition (CVPR
2019
A learning problem that is independent of the set theory ZFC axioms
We consider the following statistical estimation problem: given a family F of
real valued functions over some domain X and an i.i.d. sample drawn from an
unknown distribution P over X, find h in F such that the expectation of h
w.r.t. P is probably approximately equal to the supremum over expectations on
members of F. This Expectation Maximization (EMX) problem captures many well
studied learning problems; in fact, it is equivalent to Vapnik's general
setting of learning.
Surprisingly, we show that the EMX learnability, as well as the learning
rates of some basic class F, depend on the cardinality of the continuum and is
therefore independent of the set theory ZFC axioms (that are widely accepted as
a formalization of the notion of a mathematical proof).
We focus on the case where the functions in F are Boolean, which generalizes
classification problems. We study the interaction between the statistical
sample complexity of F and its combinatorial structure. We introduce a new
version of sample compression schemes and show that it characterizes EMX
learnability for a wide family of classes. However, we show that for the class
of finite subsets of the real line, the existence of such compression schemes
is independent of set theory. We conclude that the learnability of that class
with respect to the family of probability distributions of countable support is
independent of the set theory ZFC axioms.
We also explore the existence of a "VC-dimension-like" parameter that
captures learnability in this setting. Our results imply that that there exist
no "finitary" combinatorial parameter that characterizes EMX learnability in a
way similar to the VC-dimension based characterization of binary valued
classification problems
Countering Adversarial Images using Input Transformations
This paper investigates strategies that defend against adversarial-example
attacks on image-classification systems by transforming the inputs before
feeding them to the system. Specifically, we study applying image
transformations such as bit-depth reduction, JPEG compression, total variance
minimization, and image quilting before feeding the image to a convolutional
network classifier. Our experiments on ImageNet show that total variance
minimization and image quilting are very effective defenses in practice, in
particular, when the network is trained on transformed images. The strength of
those defenses lies in their non-differentiable nature and their inherent
randomness, which makes it difficult for an adversary to circumvent the
defenses. Our best defense eliminates 60% of strong gray-box and 90% of strong
black-box attacks by a variety of major attack methodsComment: 12 pages, 6 figures, submitted to ICLR 201
Incremental multi-domain learning with network latent tensor factorization
The prominence of deep learning, large amount of annotated data and
increasingly powerful hardware made it possible to reach remarkable performance
for supervised classification tasks, in many cases saturating the training
sets. However the resulting models are specialized to a single very specific
task and domain. Adapting the learned classification to new domains is a hard
problem due to at least three reasons: (1) the new domains and the tasks might
be drastically different; (2) there might be very limited amount of annotated
data on the new domain and (3) full training of a new model for each new task
is prohibitive in terms of computation and memory, due to the sheer number of
parameters of deep CNNs. In this paper, we present a method to learn
new-domains and tasks incrementally, building on prior knowledge from already
learned tasks and without catastrophic forgetting. We do so by jointly
parametrizing weights across layers using low-rank Tucker structure. The core
is task agnostic while a set of task specific factors are learnt on each new
domain. We show that leveraging tensor structure enables better performance
than simply using matrix operations. Joint tensor modelling also naturally
leverages correlations across different layers. Compared with previous methods
which have focused on adapting each layer separately, our approach results in
more compact representations for each new task/domain. We apply the proposed
method to the 10 datasets of the Visual Decathlon Challenge and show that our
method offers on average about 7.5x reduction in number of parameters and
competitive performance in terms of both classification accuracy and Decathlon
score.Comment: AAAI2
- …