6 research outputs found
Dither is Better than Dropout for Regularising Deep Neural Networks
Regularisation of deep neural networks (DNN) during training is critical to
performance. By far the most popular method is known as dropout. Here, cast
through the prism of signal processing theory, we compare and contrast the
regularisation effects of dropout with those of dither. We illustrate some
serious inherent limitations of dropout and demonstrate that dither provides a
more effective regulariser
Parallel Dither and Dropout for Regularising Deep Neural Networks
Effective regularisation during training can mean the difference between
success and failure for deep neural networks. Recently, dither has been
suggested as alternative to dropout for regularisation during batch-averaged
stochastic gradient descent (SGD). In this article, we show that these methods
fail without batch averaging and we introduce a new, parallel regularisation
method that may be used without batch averaging. Our results for
parallel-regularised non-batch-SGD are substantially better than what is
possible with batch-SGD. Furthermore, our results demonstrate that dither and
dropout are complimentary
Taming the ReLU with Parallel Dither in a Deep Neural Network
Rectified Linear Units (ReLU) seem to have displaced traditional 'smooth'
nonlinearities as activation-function-du-jour in many - but not all - deep
neural network (DNN) applications. However, nobody seems to know why. In this
article, we argue that ReLU are useful because they are ideal demodulators -
this helps them perform fast abstract learning. However, this fast learning
comes at the expense of serious nonlinear distortion products - decoy features.
We show that Parallel Dither acts to suppress the decoy features, preventing
overfitting and leaving the true features cleanly demodulated for rapid,
reliable learning
Use it or Lose it: Selective Memory and Forgetting in a Perpetual Learning Machine
In a recent article we described a new type of deep neural network - a
Perpetual Learning Machine (PLM) - which is capable of learning 'on the fly'
like a brain by existing in a state of Perpetual Stochastic Gradient Descent
(PSGD). Here, by simulating the process of practice, we demonstrate both
selective memory and selective forgetting when we introduce statistical recall
biases during PSGD. Frequently recalled memories are remembered, whilst
memories recalled rarely are forgotten. This results in a 'use it or lose it'
stimulus driven memory process that is similar to human memory.Comment: arXiv admin note: substantial text overlap with arXiv:1509.0091
Uniform Learning in a Deep Neural Network via "Oddball" Stochastic Gradient Descent
When training deep neural networks, it is typically assumed that the training
examples are uniformly difficult to learn. Or, to restate, it is assumed that
the training error will be uniformly distributed across the training examples.
Based on these assumptions, each training example is used an equal number of
times. However, this assumption may not be valid in many cases. "Oddball SGD"
(novelty-driven stochastic gradient descent) was recently introduced to drive
training probabilistically according to the error distribution - training
frequency is proportional to training error magnitude. In this article, using a
deep neural network to encode a video, we show that oddball SGD can be used to
enforce uniform error across the training set
Qualitative Projection Using Deep Neural Networks
Deep neural networks (DNN) abstract by demodulating the output of linear
filters. In this article, we refine this definition of abstraction to show that
the inputs of a DNN are abstracted with respect to the filters. Or, to restate,
the abstraction is qualified by the filters. This leads us to introduce the
notion of qualitative projection. We use qualitative projection to abstract
MNIST hand-written digits with respect to the various dogs, horses, planes and
cars of the CIFAR dataset. We then classify the MNIST digits according to the
magnitude of their dogness, horseness, planeness and carness qualities,
illustrating the generality of qualitative projection