99 research outputs found

    Data Dropout in Arbitrary Basis for Deep Network Regularization

    Full text link
    An important problem in training deep networks with high capacity is to ensure that the trained network works well when presented with new inputs outside the training dataset. Dropout is an effective regularization technique to boost the network generalization in which a random subset of the elements of the given data and the extracted features are set to zero during the training process. In this paper, a new randomized regularization technique in which we withhold a random part of the data without necessarily turning off the neurons/data-elements is proposed. In the proposed method, of which the conventional dropout is shown to be a special case, random data dropout is performed in an arbitrary basis, hence the designation Generalized Dropout. We also present a framework whereby the proposed technique can be applied efficiently to convolutional neural networks. The presented numerical experiments demonstrate that the proposed technique yields notable performance gain. Generalized Dropout provides new insight into the idea of dropout, shows that we can achieve different performance gains by using different bases matrices, and opens up a new research question as of how to choose optimal bases matrices that achieve maximal performance gain

    Gaussian Error Linear Units (GELUs)

    Full text link
    We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is xΦ(x)x\Phi(x), where Φ(x)\Phi(x) the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs (x1x>0x\mathbf{1}_{x>0}). We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all considered computer vision, natural language processing, and speech tasks.Comment: Trimmed version of 2016 draft; add exact formul

    Excitation Dropout: Encouraging Plasticity in Deep Neural Networks

    Full text link
    We propose a guided dropout regularizer for deep networks based on the evidence of a network prediction defined as the firing of neurons in specific paths. In this work, we utilize the evidence at each neuron to determine the probability of dropout, rather than dropping out neurons uniformly at random as in standard dropout. In essence, we dropout with higher probability those neurons which contribute more to decision making at training time. This approach penalizes high saliency neurons that are most relevant for model prediction, i.e. those having stronger evidence. By dropping such high-saliency neurons, the network is forced to learn alternative paths in order to maintain loss minimization, resulting in a plasticity-like behavior, a characteristic of human brains too. We demonstrate better generalization ability, an increased utilization of network neurons, and a higher resilience to network compression using several metrics over four image/video recognition benchmarks
    • …