2,122 research outputs found
Extreme Classification via Adversarial Softmax Approximation
Training a classifier over a large number of classes, known as 'extreme
classification', has become a topic of major interest with applications in
technology, science, and e-commerce. Traditional softmax regression induces a
gradient cost proportional to the number of classes , which often is
prohibitively expensive. A popular scalable softmax approximation relies on
uniform negative sampling, which suffers from slow convergence due a poor
signal-to-noise ratio. In this paper, we propose a simple training method for
drastically enhancing the gradient signal by drawing negative samples from an
adversarial model that mimics the data distribution. Our contributions are
three-fold: (i) an adversarial sampling mechanism that produces negative
samples at a cost only logarithmic in , thus still resulting in cheap
gradient updates; (ii) a mathematical proof that this adversarial sampling
minimizes the gradient variance while any bias due to non-uniform sampling can
be removed; (iii) experimental results on large scale data sets that show a
reduction of the training time by an order of magnitude relative to several
competitive baselines.Comment: Accepted for presentation at the Eighth International Conference on
Learning Representations (ICLR 2020),
https://openreview.net/forum?id=rJxe3xSYD
Training Shallow and Thin Networks for Acceleration via Knowledge Distillation with Conditional Adversarial Networks
There is an increasing interest on accelerating neural networks for real-time
applications. We study the student-teacher strategy, in which a small and fast
student network is trained with the auxiliary information learned from a large
and accurate teacher network. We propose to use conditional adversarial
networks to learn the loss function to transfer knowledge from teacher to
student. The proposed method is particularly effective for relatively small
student networks. Moreover, experimental results show the effect of network
size when the modern networks are used as student. We empirically study the
trade-off between inference time and classification accuracy, and provide
suggestions on choosing a proper student network.Comment: Shorter version will appear at ICLR workshop 201
Intriguing properties of neural networks
Deep neural networks are highly expressive models that have recently achieved
state of the art performance on speech and visual recognition tasks. While
their expressiveness is the reason they succeed, it also causes them to learn
uninterpretable solutions that could have counter-intuitive properties. In this
paper we report two such properties.
First, we find that there is no distinction between individual high level
units and random linear combinations of high level units, according to various
methods of unit analysis. It suggests that it is the space, rather than the
individual units, that contains of the semantic information in the high layers
of neural networks.
Second, we find that deep neural networks learn input-output mappings that
are fairly discontinuous to a significant extend. We can cause the network to
misclassify an image by applying a certain imperceptible perturbation, which is
found by maximizing the network's prediction error. In addition, the specific
nature of these perturbations is not a random artifact of learning: the same
perturbation can cause a different network, that was trained on a different
subset of the dataset, to misclassify the same input
Towards Open Set Deep Networks
Deep networks have produced significant gains for various visual recognition
problems, leading to high impact academic and commercial applications. Recent
work in deep networks highlighted that it is easy to generate images that
humans would never classify as a particular object class, yet networks classify
such images high confidence as that given class - deep network are easily
fooled with images humans do not consider meaningful. The closed set nature of
deep networks forces them to choose from one of the known classes leading to
such artifacts. Recognition in the real world is open set, i.e. the recognition
system should reject unknown/unseen classes at test time. We present a
methodology to adapt deep networks for open set recognition, by introducing a
new model layer, OpenMax, which estimates the probability of an input being
from an unknown class. A key element of estimating the unknown probability is
adapting Meta-Recognition concepts to the activation patterns in the
penultimate layer of the network. OpenMax allows rejection of "fooling" and
unrelated open set images presented to the system; OpenMax greatly reduces the
number of obvious errors made by a deep network. We prove that the OpenMax
concept provides bounded open space risk, thereby formally providing an open
set recognition solution. We evaluate the resulting open set deep networks
using pre-trained networks from the Caffe Model-zoo on ImageNet 2012 validation
data, and thousands of fooling and open set images. The proposed OpenMax model
significantly outperforms open set recognition accuracy of basic deep networks
as well as deep networks with thresholding of SoftMax probabilities
Deep Neural Networks
Deep Neural Networks (DNNs) are universal function approximators providing
state-of- the-art solutions on wide range of applications. Common perceptual
tasks such as speech recognition, image classification, and object tracking are
now commonly tackled via DNNs. Some fundamental problems remain: (1) the lack
of a mathematical framework providing an explicit and interpretable
input-output formula for any topology, (2) quantification of DNNs stability
regarding adversarial examples (i.e. modified inputs fooling DNN predictions
whilst undetectable to humans), (3) absence of generalization guarantees and
controllable behaviors for ambiguous patterns, (4) leverage unlabeled data to
apply DNNs to domains where expert labeling is scarce as in the medical field.
Answering those points would provide theoretical perspectives for further
developments based on a common ground. Furthermore, DNNs are now deployed in
tremendous societal applications, pushing the need to fill this theoretical gap
to ensure control, reliability, and interpretability.Comment: Technical Repor
Stochastic Security: Adversarial Defense Using Long-Run Dynamics of Energy-Based Models
The vulnerability of deep networks to adversarial attacks is a central
problem for deep learning from the perspective of both cognition and security.
The current most successful defense method is to train a classifier using
adversarial images created during learning. Another defense approach involves
transformation or purification of the original input to remove adversarial
signals before the image is classified. We focus on defending naturally-trained
classifiers using Markov Chain Monte Carlo (MCMC) sampling with an Energy-Based
Model (EBM) for adversarial purification. In contrast to adversarial training,
our approach is intended to secure pre-existing and highly vulnerable
classifiers.
The memoryless behavior of long-run MCMC sampling will eventually remove
adversarial signals, while metastable behavior preserves consistent appearance
of MCMC samples after many steps to allow accurate long-run prediction.
Balancing these factors can lead to effective purification and robust
classification. We evaluate adversarial defense with an EBM using the strongest
known attacks against purification. Our contributions are 1) an improved method
for training EBM's with realistic long-run MCMC samples, 2) an
Expectation-Over-Transformation (EOT) defense that resolves theoretical
ambiguities for stochastic defenses and from which the EOT attack naturally
follows, and 3) state-of-the-art adversarial defense for naturally-trained
classifiers and competitive defense compared to adversarially-trained
classifiers on Cifar-10, SVHN, and Cifar-100. Code and pre-trained models are
available at https://github.com/point0bar1/ebm-defense.Comment: ICLR 202
QBDC: Query by dropout committee for training deep supervised architecture
While the current trend is to increase the depth of neural networks to
increase their performance, the size of their training database has to grow
accordingly. We notice an emergence of tremendous databases, although providing
labels to build a training set still remains a very expensive task. We tackle
the problem of selecting the samples to be labelled in an online fashion. In
this paper, we present an active learning strategy based on query by committee
and dropout technique to train a Convolutional Neural Network (CNN). We derive
a commmittee of partial CNNs resulting from batchwise dropout runs on the
initial CNN. We evaluate our active learning strategy for CNN on MNIST
benchmark, showing in particular that selecting less than 30 % from the
annotated database is enough to get similar error rate as using the full
training set on MNIST. We also studied the robustness of our method against
adversarial examples.Comment: Submitted to ICLR201
Boundary-Seeking Generative Adversarial Networks
Generative adversarial networks (GANs) are a learning framework that rely on
training a discriminator to estimate a measure of difference between a target
and generated distributions. GANs, as normally formulated, rely on the
generated samples being completely differentiable w.r.t. the generative
parameters, and thus do not work for discrete data. We introduce a method for
training GANs with discrete data that uses the estimated difference measure
from the discriminator to compute importance weights for generated samples,
thus providing a policy gradient for training the generator. The importance
weights have a strong connection to the decision boundary of the discriminator,
and we call our method boundary-seeking GANs (BGANs). We demonstrate the
effectiveness of the proposed algorithm with discrete image and character-based
natural language generation. In addition, the boundary-seeking objective
extends to continuous data, which can be used to improve stability of training,
and we demonstrate this on Celeba, Large-scale Scene Understanding (LSUN)
bedrooms, and Imagenet without conditioning
Global Multiclass Classification and Dataset Construction via Heterogeneous Local Experts
In the domains of dataset construction and crowdsourcing, a notable challenge
is to aggregate labels from a heterogeneous set of labelers, each of whom is
potentially an expert in some subset of tasks (and less reliable in others). To
reduce costs of hiring human labelers or training automated labeling systems,
it is of interest to minimize the number of labelers while ensuring the
reliability of the resulting dataset. We model this as the problem of
performing -class classification using the predictions of smaller
classifiers, each trained on a subset of , and derive bounds on the number
of classifiers needed to accurately infer the true class of an unlabeled sample
under both adversarial and stochastic assumptions. By exploiting a connection
to the classical set cover problem, we produce a near-optimal scheme for
designing such configurations of classifiers which recovers the well known
one-vs.-one classification approach as a special case. Experiments with the
MNIST and CIFAR-10 datasets demonstrate the favorable accuracy (compared to a
centralized classifier) of our aggregation scheme applied to classifiers
trained on subsets of the data. These results suggest a new way to
automatically label data or adapt an existing set of local classifiers to
larger-scale multiclass problems.Comment: 27 pages, 8 figures, to be published in IEEE Journal on Selected
Areas in Information Theory (JSAIT) - Special Issue on Estimation and
Inferenc
Adversarial Examples - A Complete Characterisation of the Phenomenon
We provide a complete characterisation of the phenomenon of adversarial
examples - inputs intentionally crafted to fool machine learning models. We aim
to cover all the important concerns in this field of study: (1) the conjectures
on the existence of adversarial examples, (2) the security, safety and
robustness implications, (3) the methods used to generate and (4) protect
against adversarial examples and (5) the ability of adversarial examples to
transfer between different machine learning models. We provide ample background
information in an effort to make this document self-contained. Therefore, this
document can be used as survey, tutorial or as a catalog of attacks and
defences using adversarial examples
- …