Search CORE

6 research outputs found

Dither is Better than Dropout for Regularising Deep Neural Networks

Author: Simpson Andrew J. R.
Publication venue
Publication date: 26/08/2015
Field of study

Regularisation of deep neural networks (DNN) during training is critical to performance. By far the most popular method is known as dropout. Here, cast through the prism of signal processing theory, we compare and contrast the regularisation effects of dropout with those of dither. We illustrate some serious inherent limitations of dropout and demonstrate that dither provides a more effective regulariser

arXiv.org e-Print Archive

Parallel Dither and Dropout for Regularising Deep Neural Networks

Author: Simpson Andrew J. R.
Publication venue
Publication date: 28/08/2015
Field of study

Effective regularisation during training can mean the difference between success and failure for deep neural networks. Recently, dither has been suggested as alternative to dropout for regularisation during batch-averaged stochastic gradient descent (SGD). In this article, we show that these methods fail without batch averaging and we introduce a new, parallel regularisation method that may be used without batch averaging. Our results for parallel-regularised non-batch-SGD are substantially better than what is possible with batch-SGD. Furthermore, our results demonstrate that dither and dropout are complimentary

arXiv.org e-Print Archive

Taming the ReLU with Parallel Dither in a Deep Neural Network

Author: Simpson Andrew J. R.
Publication venue
Publication date: 17/09/2015
Field of study

Rectified Linear Units (ReLU) seem to have displaced traditional 'smooth' nonlinearities as activation-function-du-jour in many - but not all - deep neural network (DNN) applications. However, nobody seems to know why. In this article, we argue that ReLU are useful because they are ideal demodulators - this helps them perform fast abstract learning. However, this fast learning comes at the expense of serious nonlinear distortion products - decoy features. We show that Parallel Dither acts to suppress the decoy features, preventing overfitting and leaving the true features cleanly demodulated for rapid, reliable learning

arXiv.org e-Print Archive

Use it or Lose it: Selective Memory and Forgetting in a Perpetual Learning Machine

Author: Simpson Andrew J. R.
Publication venue
Publication date: 10/09/2015
Field of study

In a recent article we described a new type of deep neural network - a Perpetual Learning Machine (PLM) - which is capable of learning 'on the fly' like a brain by existing in a state of Perpetual Stochastic Gradient Descent (PSGD). Here, by simulating the process of practice, we demonstrate both selective memory and selective forgetting when we introduce statistical recall biases during PSGD. Frequently recalled memories are remembered, whilst memories recalled rarely are forgotten. This results in a 'use it or lose it' stimulus driven memory process that is similar to human memory.Comment: arXiv admin note: substantial text overlap with arXiv:1509.0091

arXiv.org e-Print Archive

Uniform Learning in a Deep Neural Network via "Oddball" Stochastic Gradient Descent

Author: Simpson Andrew J. R.
Publication venue
Publication date: 08/10/2015
Field of study

When training deep neural networks, it is typically assumed that the training examples are uniformly difficult to learn. Or, to restate, it is assumed that the training error will be uniformly distributed across the training examples. Based on these assumptions, each training example is used an equal number of times. However, this assumption may not be valid in many cases. "Oddball SGD" (novelty-driven stochastic gradient descent) was recently introduced to drive training probabilistically according to the error distribution - training frequency is proportional to training error magnitude. In this article, using a deep neural network to encode a video, we show that oddball SGD can be used to enforce uniform error across the training set

arXiv.org e-Print Archive

Qualitative Projection Using Deep Neural Networks

Author: Simpson Andrew J. R.
Publication venue
Publication date: 28/10/2015
Field of study

Deep neural networks (DNN) abstract by demodulating the output of linear filters. In this article, we refine this definition of abstraction to show that the inputs of a DNN are abstracted with respect to the filters. Or, to restate, the abstraction is qualified by the filters. This leads us to introduce the notion of qualitative projection. We use qualitative projection to abstract MNIST hand-written digits with respect to the various dogs, horses, planes and cars of the CIFAR dataset. We then classify the MNIST digits according to the magnitude of their dogness, horseness, planeness and carness qualities, illustrating the generality of qualitative projection

arXiv.org e-Print Archive