68,231 research outputs found

    Degrees of Freedom in Deep Neural Networks

    Full text link
    In this paper, we explore degrees of freedom in deep sigmoidal neural networks. We show that the degrees of freedom in these models is related to the expected optimism, which is the expected difference between test error and training error. We provide an efficient Monte-Carlo method to estimate the degrees of freedom for multi-class classification methods. We show degrees of freedom are lower than the parameter count in a simple XOR network. We extend these results to neural nets trained on synthetic and real data, and investigate impact of network's architecture and different regularization choices. The degrees of freedom in deep networks are dramatically smaller than the number of parameters, in some real datasets several orders of magnitude. Further, we observe that for fixed number of parameters, deeper networks have less degrees of freedom exhibiting a regularization-by-depth

    Neural Network Regularization via Robust Weight Factorization

    Full text link
    Regularization is essential when training large neural networks. As deep neural networks can be mathematically interpreted as universal function approximators, they are effective at memorizing sampling noise in the training data. This results in poor generalization to unseen data. Therefore, it is no surprise that a new regularization technique, Dropout, was partially responsible for the now-ubiquitous winning entry to ImageNet 2012 by the University of Toronto. Currently, Dropout (and related methods such as DropConnect) are the most effective means of regularizing large neural networks. These amount to efficiently visiting a large number of related models at training time, while aggregating them to a single predictor at test time. The proposed FaMe model aims to apply a similar strategy, yet learns a factorization of each weight matrix such that the factors are robust to noise

    Regularization Methods for Generative Adversarial Networks: An Overview of Recent Studies

    Full text link
    Despite its short history, Generative Adversarial Network (GAN) has been extensively studied and used for various tasks, including its original purpose, i.e., synthetic sample generation. However, applying GAN to different data types with diverse neural network architectures has been hindered by its limitation in training, where the model easily diverges. Such a notorious training of GANs is well known and has been addressed in numerous studies. Consequently, in order to make the training of GAN stable, numerous regularization methods have been proposed in recent years. This paper reviews the regularization methods that have been recently introduced, most of which have been published in the last three years. Specifically, we focus on general methods that can be commonly used regardless of neural network architectures. To explore the latest research trends in the regularization for GANs, the methods are classified into several groups by their operation principles, and the differences between the methods are analyzed. Furthermore, to provide practical knowledge of using these methods, we investigate popular methods that have been frequently employed in state-of-the-art GANs. In addition, we discuss the limitations in existing methods and propose future research directions

    DropBlock: A regularization method for convolutional networks

    Full text link
    Deep neural networks often work well when they are over-parameterized and trained with a massive amount of noise and regularization, such as weight decay and dropout. Although dropout is widely used as a regularization technique for fully connected layers, it is often less effective for convolutional layers. This lack of success of dropout for convolutional layers is perhaps due to the fact that activation units in convolutional layers are spatially correlated so information can still flow through convolutional networks despite dropout. Thus a structured form of dropout is needed to regularize convolutional networks. In this paper, we introduce DropBlock, a form of structured dropout, where units in a contiguous region of a feature map are dropped together. We found that applying DropbBlock in skip connections in addition to the convolution layers increases the accuracy. Also, gradually increasing number of dropped units during training leads to better accuracy and more robust to hyperparameter choices. Extensive experiments show that DropBlock works better than dropout in regularizing convolutional networks. On ImageNet classification, ResNet-50 architecture with DropBlock achieves 78.13%78.13\% accuracy, which is more than 1.6%1.6\% improvement on the baseline. On COCO detection, DropBlock improves Average Precision of RetinaNet from 36.8%36.8\% to 38.4%38.4\%.Comment: Accepted at NIPS 201

    Improving Deep Learning Models via Constraint-Based Domain Knowledge: a Brief Survey

    Full text link
    Deep Learning (DL) models proved themselves to perform extremely well on a wide variety of learning tasks, as they can learn useful patterns from large data sets. However, purely data-driven models might struggle when very difficult functions need to be learned or when there is not enough available training data. Fortunately, in many domains prior information can be retrieved and used to boost the performance of DL models. This paper presents a first survey of the approaches devised to integrate domain knowledge, expressed in the form of constraints, in DL learning models to improve their performance, in particular targeting deep neural networks. We identify five (non-mutually exclusive) categories that encompass the main approaches to inject domain knowledge: 1) acting on the features space, 2) modifications to the hypothesis space, 3) data augmentation, 4) regularization schemes, 5) constrained learning

    A continual learning survey: Defying forgetting in classification tasks

    Full text link
    Artificial neural networks thrive in solving the classification problem for a particular rigid task, acquiring knowledge through generalized learning behaviour from a distinct training phase. The resulting network resembles a static entity of knowledge, with endeavours to extend this knowledge without targeting the original task resulting in a catastrophic forgetting. Continual learning shifts this paradigm towards networks that can continually accumulate knowledge over different tasks without the need to retrain from scratch. We focus on task incremental classification, where tasks arrive sequentially and are delineated by clear boundaries. Our main contributions concern 1) a taxonomy and extensive overview of the state-of-the-art, 2) a novel framework to continually determine the stability-plasticity trade-off of the continual learner, 3) a comprehensive experimental comparison of 11 state-of-the-art continual learning methods and 4 baselines. We empirically scrutinize method strengths and weaknesses on three benchmarks, considering Tiny Imagenet and large-scale unbalanced iNaturalist and a sequence of recognition datasets. We study the influence of model capacity, weight decay and dropout regularization, and the order in which the tasks are presented, and qualitatively compare methods in terms of required memory, computation time, and storage

    An Overview of Multi-Task Learning in Deep Neural Networks

    Full text link
    Multi-task learning (MTL) has led to successes in many applications of machine learning, from natural language processing and speech recognition to computer vision and drug discovery. This article aims to give a general overview of MTL, particularly in deep neural networks. It introduces the two most common methods for MTL in Deep Learning, gives an overview of the literature, and discusses recent advances. In particular, it seeks to help ML practitioners apply MTL by shedding light on how MTL works and providing guidelines for choosing appropriate auxiliary tasks.Comment: 14 pages, 8 figure

    Bridgeout: stochastic bridge regularization for deep neural networks

    Full text link
    A major challenge in training deep neural networks is overfitting, i.e. inferior performance on unseen test examples compared to performance on training examples. To reduce overfitting, stochastic regularization methods have shown superior performance compared to deterministic weight penalties on a number of image recognition tasks. Stochastic methods such as Dropout and Shakeout, in expectation, are equivalent to imposing a ridge and elastic-net penalty on the model parameters, respectively. However, the choice of the norm of weight penalty is problem dependent and is not restricted to {L1,L2}\{L_1,L_2\}. Therefore, in this paper we propose the Bridgeout stochastic regularization technique and prove that it is equivalent to an LqL_q penalty on the weights, where the norm qq can be learned as a hyperparameter from data. Experimental results show that Bridgeout results in sparse model weights, improved gradients and superior classification performance compared to Dropout and Shakeout on synthetic and real datasets

    Fraternal Dropout

    Full text link
    Recurrent neural networks (RNNs) are important class of architectures among neural networks useful for language modeling and sequential prediction. However, optimizing RNNs is known to be harder compared to feed-forward neural networks. A number of techniques have been proposed in literature to address this problem. In this paper we propose a simple technique called fraternal dropout that takes advantage of dropout to achieve this goal. Specifically, we propose to train two identical copies of an RNN (that share parameters) with different dropout masks while minimizing the difference between their (pre-softmax) predictions. In this way our regularization encourages the representations of RNNs to be invariant to dropout mask, thus being robust. We show that our regularization term is upper bounded by the expectation-linear dropout objective which has been shown to address the gap due to the difference between the train and inference phases of dropout. We evaluate our model and achieve state-of-the-art results in sequence modeling tasks on two benchmark datasets - Penn Treebank and Wikitext-2. We also show that our approach leads to performance improvement by a significant margin in image captioning (Microsoft COCO) and semi-supervised (CIFAR-10) tasks.Comment: Accepted to ICLR 2018. Extended appendix. Added official GitHub code for replication: https://github.com/kondiz/fraternal-dropout . Added references. Corrected typo

    Training for Faster Adversarial Robustness Verification via Inducing ReLU Stability

    Full text link
    We explore the concept of co-design in the context of neural network verification. Specifically, we aim to train deep neural networks that not only are robust to adversarial perturbations but also whose robustness can be verified more easily. To this end, we identify two properties of network models - weight sparsity and so-called ReLU stability - that turn out to significantly impact the complexity of the corresponding verification task. We demonstrate that improving weight sparsity alone already enables us to turn computationally intractable verification problems into tractable ones. Then, improving ReLU stability leads to an additional 4-13x speedup in verification times. An important feature of our methodology is its "universality," in the sense that it can be used with a broad range of training procedures and verification approaches
    • …
    corecore