62 research outputs found
Some Approximation Bounds for Deep Networks
In this paper we introduce new bounds on the approximation of functions in
deep networks and in doing so introduce some new deep network architectures for
function approximation. These results give some theoretical insight into the
success of autoencoders and ResNets.Comment: 9 page
Deep Radial Kernel Networks: Approximating Radially Symmetric Functions with Deep Networks
We prove that a particular deep network architecture is more efficient at
approximating radially symmetric functions than the best known 2 or 3 layer
networks. We use this architecture to approximate Gaussian kernel SVMs, and
subsequently improve upon them with further training. The architecture and
initial weights of the Deep Radial Kernel Network are completely specified by
the SVM and therefore sidesteps the problem of empirically choosing an
appropriate deep network architecture
PProCRC: Probabilistic Collaboration of Image Patches
We present a conditional probabilistic framework for collaborative
representation of image patches. It incorporates background compensation and
outlier patch suppression into the main formulation itself, thus doing away
with the need for pre-processing steps to handle the same. A closed form
non-iterative solution of the cost function is derived. The proposed method
(PProCRC) outperforms earlier CRC formulations: patch based (PCRC, GP-CRC) as
well as the state-of-the-art probabilistic (ProCRC and EProCRC) on three
fine-grained species recognition datasets (Oxford Flowers, Oxford-IIIT Pets and
CUB Birds) using two CNN backbones (Vgg-19 and ResNet-50)
Pseudo-Rehearsal: Achieving Deep Reinforcement Learning without Catastrophic Forgetting
Neural networks can achieve excellent results in a wide variety of
applications. However, when they attempt to sequentially learn, they tend to
learn the new task while catastrophically forgetting previous ones. We propose
a model that overcomes catastrophic forgetting in sequential reinforcement
learning by combining ideas from continual learning in both the image
classification domain and the reinforcement learning domain. This model
features a dual memory system which separates continual learning from
reinforcement learning and a pseudo-rehearsal system that "recalls" items
representative of previous tasks via a deep generative network. Our model
sequentially learns Atari 2600 games while continuing to perform above human
level and equally well as independent models trained separately on each game.
This result is achieved without: demanding additional storage requirements as
the number of tasks increases, storing raw data or revisiting past tasks. In
comparison, previous state-of-the-art solutions are substantially more
vulnerable to forgetting on these complex deep reinforcement learning tasks
Effects of the optimisation of the margin distribution on generalisation in deep architectures
Despite being so vital to success of Support Vector Machines, the principle
of separating margin maximisation is not used in deep learning. We show that
minimisation of margin variance and not maximisation of the margin is more
suitable for improving generalisation in deep architectures. We propose the
Halfway loss function that minimises the Normalised Margin Variance (NMV) at
the output of a deep learning models and evaluate its performance against the
Softmax Cross-Entropy loss on the MNIST, smallNORB and CIFAR-10 datasets
VASE: Variational Assorted Surprise Exploration for Reinforcement Learning
Exploration in environments with continuous control and sparse rewards
remains a key challenge in reinforcement learning (RL). Recently, surprise has
been used as an intrinsic reward that encourages systematic and efficient
exploration. We introduce a new definition of surprise and its RL
implementation named Variational Assorted Surprise Exploration (VASE). VASE
uses a Bayesian neural network as a model of the environment dynamics and is
trained using variational inference, alternately updating the accuracy of the
agent's model and policy. Our experiments show that in continuous control
sparse reward environments VASE outperforms other surprise-based exploration
techniques
Switched linear projections for neural network interpretability
We introduce switched linear projections for expressing the activity of a
neuron in a deep neural network in terms of a single linear projection in the
input space. The method works by isolating the active subnetwork, a series of
linear transformations, that determine the entire computation of the network
for a given input instance. With these projections we can decompose activity in
any hidden layer into patterns detected in a given input instance. We also
propose that in ReLU networks it is instructive and meaningful to examine
patterns that deactivate the neurons in a hidden layer, something that is
implicitly ignored by the existing interpretability methods tracking solely the
active aspect of the network's computation
RocNet: Recursive Octree Network for Efficient 3D Deep Representation
We introduce a deep recursive octree network for the compression of 3D voxel
data. Our network compresses a voxel grid of any size down to a very small
latent space in an autoencoder-like network. We show results for compressing
32, 64 and 128 grids down to just 80 floats in the latent space. We demonstrate
the effectiveness and efficiency of our proposed method on several publicly
available datasets with three experiments: 3D shape classification, 3D shape
reconstruction, and shape generation. Experimental results show that our
algorithm maintains accuracy while consuming less memory with shorter training
times compared to existing methods, especially in 3D reconstruction tasks
Conceptual capacity and effective complexity of neural networks
We propose a complexity measure of a neural network mapping function based on
the diversity of the set of tangent spaces from different inputs. Treating each
tangent space as a linear PAC concept we use an entropy-based measure of the
bundle of concepts in order to estimate the conceptual capacity of the network.
The theoretical maximal capacity of a ReLU network is equivalent to the number
of its neurons. In practice however, due to correlations between neuron
activities within the network, the actual capacity can be remarkably small,
even for very big networks. Empirical evaluations show that this new measure is
correlated with the complexity of the mapping function and thus the
generalisation capabilities of the corresponding network. It captures the
effective, as oppose to the theoretical, complexity of the network function. We
also showcase some uses of the proposed measure for analysis and comparison of
trained neural network models
Pseudo-Recursal: Solving the Catastrophic Forgetting Problem in Deep Neural Networks
In general, neural networks are not currently capable of learning tasks in a
sequential fashion. When a novel, unrelated task is learnt by a neural network,
it substantially forgets how to solve previously learnt tasks. One of the
original solutions to this problem is pseudo-rehearsal, which involves learning
the new task while rehearsing generated items representative of the previous
task/s. This is very effective for simple tasks. However, pseudo-rehearsal has
not yet been successfully applied to very complex tasks because in these tasks
it is difficult to generate representative items. We accomplish
pseudo-rehearsal by using a Generative Adversarial Network to generate items so
that our deep network can learn to sequentially classify the CIFAR-10, SVHN and
MNIST datasets. After training on all tasks, our network loses only 1.67%
absolute accuracy on CIFAR-10 and gains 0.24% absolute accuracy on SVHN. Our
model's performance is a substantial improvement compared to the current state
of the art solution
- …