68,231 research outputs found
Degrees of Freedom in Deep Neural Networks
In this paper, we explore degrees of freedom in deep sigmoidal neural
networks. We show that the degrees of freedom in these models is related to the
expected optimism, which is the expected difference between test error and
training error. We provide an efficient Monte-Carlo method to estimate the
degrees of freedom for multi-class classification methods. We show degrees of
freedom are lower than the parameter count in a simple XOR network. We extend
these results to neural nets trained on synthetic and real data, and
investigate impact of network's architecture and different regularization
choices. The degrees of freedom in deep networks are dramatically smaller than
the number of parameters, in some real datasets several orders of magnitude.
Further, we observe that for fixed number of parameters, deeper networks have
less degrees of freedom exhibiting a regularization-by-depth
Neural Network Regularization via Robust Weight Factorization
Regularization is essential when training large neural networks. As deep
neural networks can be mathematically interpreted as universal function
approximators, they are effective at memorizing sampling noise in the training
data. This results in poor generalization to unseen data. Therefore, it is no
surprise that a new regularization technique, Dropout, was partially
responsible for the now-ubiquitous winning entry to ImageNet 2012 by the
University of Toronto. Currently, Dropout (and related methods such as
DropConnect) are the most effective means of regularizing large neural
networks. These amount to efficiently visiting a large number of related models
at training time, while aggregating them to a single predictor at test time.
The proposed FaMe model aims to apply a similar strategy, yet learns a
factorization of each weight matrix such that the factors are robust to noise
Regularization Methods for Generative Adversarial Networks: An Overview of Recent Studies
Despite its short history, Generative Adversarial Network (GAN) has been
extensively studied and used for various tasks, including its original purpose,
i.e., synthetic sample generation. However, applying GAN to different data
types with diverse neural network architectures has been hindered by its
limitation in training, where the model easily diverges. Such a notorious
training of GANs is well known and has been addressed in numerous studies.
Consequently, in order to make the training of GAN stable, numerous
regularization methods have been proposed in recent years. This paper reviews
the regularization methods that have been recently introduced, most of which
have been published in the last three years. Specifically, we focus on general
methods that can be commonly used regardless of neural network architectures.
To explore the latest research trends in the regularization for GANs, the
methods are classified into several groups by their operation principles, and
the differences between the methods are analyzed. Furthermore, to provide
practical knowledge of using these methods, we investigate popular methods that
have been frequently employed in state-of-the-art GANs. In addition, we discuss
the limitations in existing methods and propose future research directions
DropBlock: A regularization method for convolutional networks
Deep neural networks often work well when they are over-parameterized and
trained with a massive amount of noise and regularization, such as weight decay
and dropout. Although dropout is widely used as a regularization technique for
fully connected layers, it is often less effective for convolutional layers.
This lack of success of dropout for convolutional layers is perhaps due to the
fact that activation units in convolutional layers are spatially correlated so
information can still flow through convolutional networks despite dropout. Thus
a structured form of dropout is needed to regularize convolutional networks. In
this paper, we introduce DropBlock, a form of structured dropout, where units
in a contiguous region of a feature map are dropped together. We found that
applying DropbBlock in skip connections in addition to the convolution layers
increases the accuracy. Also, gradually increasing number of dropped units
during training leads to better accuracy and more robust to hyperparameter
choices. Extensive experiments show that DropBlock works better than dropout in
regularizing convolutional networks. On ImageNet classification, ResNet-50
architecture with DropBlock achieves accuracy, which is more than
improvement on the baseline. On COCO detection, DropBlock improves
Average Precision of RetinaNet from to .Comment: Accepted at NIPS 201
Improving Deep Learning Models via Constraint-Based Domain Knowledge: a Brief Survey
Deep Learning (DL) models proved themselves to perform extremely well on a
wide variety of learning tasks, as they can learn useful patterns from large
data sets. However, purely data-driven models might struggle when very
difficult functions need to be learned or when there is not enough available
training data. Fortunately, in many domains prior information can be retrieved
and used to boost the performance of DL models. This paper presents a first
survey of the approaches devised to integrate domain knowledge, expressed in
the form of constraints, in DL learning models to improve their performance, in
particular targeting deep neural networks. We identify five (non-mutually
exclusive) categories that encompass the main approaches to inject domain
knowledge: 1) acting on the features space, 2) modifications to the hypothesis
space, 3) data augmentation, 4) regularization schemes, 5) constrained
learning
A continual learning survey: Defying forgetting in classification tasks
Artificial neural networks thrive in solving the classification problem for a
particular rigid task, acquiring knowledge through generalized learning
behaviour from a distinct training phase. The resulting network resembles a
static entity of knowledge, with endeavours to extend this knowledge without
targeting the original task resulting in a catastrophic forgetting. Continual
learning shifts this paradigm towards networks that can continually accumulate
knowledge over different tasks without the need to retrain from scratch. We
focus on task incremental classification, where tasks arrive sequentially and
are delineated by clear boundaries. Our main contributions concern 1) a
taxonomy and extensive overview of the state-of-the-art, 2) a novel framework
to continually determine the stability-plasticity trade-off of the continual
learner, 3) a comprehensive experimental comparison of 11 state-of-the-art
continual learning methods and 4 baselines. We empirically scrutinize method
strengths and weaknesses on three benchmarks, considering Tiny Imagenet and
large-scale unbalanced iNaturalist and a sequence of recognition datasets. We
study the influence of model capacity, weight decay and dropout regularization,
and the order in which the tasks are presented, and qualitatively compare
methods in terms of required memory, computation time, and storage
An Overview of Multi-Task Learning in Deep Neural Networks
Multi-task learning (MTL) has led to successes in many applications of
machine learning, from natural language processing and speech recognition to
computer vision and drug discovery. This article aims to give a general
overview of MTL, particularly in deep neural networks. It introduces the two
most common methods for MTL in Deep Learning, gives an overview of the
literature, and discusses recent advances. In particular, it seeks to help ML
practitioners apply MTL by shedding light on how MTL works and providing
guidelines for choosing appropriate auxiliary tasks.Comment: 14 pages, 8 figure
Bridgeout: stochastic bridge regularization for deep neural networks
A major challenge in training deep neural networks is overfitting, i.e.
inferior performance on unseen test examples compared to performance on
training examples. To reduce overfitting, stochastic regularization methods
have shown superior performance compared to deterministic weight penalties on a
number of image recognition tasks. Stochastic methods such as Dropout and
Shakeout, in expectation, are equivalent to imposing a ridge and elastic-net
penalty on the model parameters, respectively. However, the choice of the norm
of weight penalty is problem dependent and is not restricted to .
Therefore, in this paper we propose the Bridgeout stochastic regularization
technique and prove that it is equivalent to an penalty on the weights,
where the norm can be learned as a hyperparameter from data. Experimental
results show that Bridgeout results in sparse model weights, improved gradients
and superior classification performance compared to Dropout and Shakeout on
synthetic and real datasets
Fraternal Dropout
Recurrent neural networks (RNNs) are important class of architectures among
neural networks useful for language modeling and sequential prediction.
However, optimizing RNNs is known to be harder compared to feed-forward neural
networks. A number of techniques have been proposed in literature to address
this problem. In this paper we propose a simple technique called fraternal
dropout that takes advantage of dropout to achieve this goal. Specifically, we
propose to train two identical copies of an RNN (that share parameters) with
different dropout masks while minimizing the difference between their
(pre-softmax) predictions. In this way our regularization encourages the
representations of RNNs to be invariant to dropout mask, thus being robust. We
show that our regularization term is upper bounded by the expectation-linear
dropout objective which has been shown to address the gap due to the difference
between the train and inference phases of dropout. We evaluate our model and
achieve state-of-the-art results in sequence modeling tasks on two benchmark
datasets - Penn Treebank and Wikitext-2. We also show that our approach leads
to performance improvement by a significant margin in image captioning
(Microsoft COCO) and semi-supervised (CIFAR-10) tasks.Comment: Accepted to ICLR 2018. Extended appendix. Added official GitHub code
for replication: https://github.com/kondiz/fraternal-dropout . Added
references. Corrected typo
Training for Faster Adversarial Robustness Verification via Inducing ReLU Stability
We explore the concept of co-design in the context of neural network
verification. Specifically, we aim to train deep neural networks that not only
are robust to adversarial perturbations but also whose robustness can be
verified more easily. To this end, we identify two properties of network models
- weight sparsity and so-called ReLU stability - that turn out to significantly
impact the complexity of the corresponding verification task. We demonstrate
that improving weight sparsity alone already enables us to turn computationally
intractable verification problems into tractable ones. Then, improving ReLU
stability leads to an additional 4-13x speedup in verification times. An
important feature of our methodology is its "universality," in the sense that
it can be used with a broad range of training procedures and verification
approaches
- …