62 research outputs found
Complex Unitary Recurrent Neural Networks using Scaled Cayley Transform
Recurrent neural networks (RNNs) have been successfully used on a wide range
of sequential data problems. A well known difficulty in using RNNs is the
\textit{vanishing or exploding gradient} problem. Recently, there have been
several different RNN architectures that try to mitigate this issue by
maintaining an orthogonal or unitary recurrent weight matrix. One such
architecture is the scaled Cayley orthogonal recurrent neural network (scoRNN)
which parameterizes the orthogonal recurrent weight matrix through a scaled
Cayley transform. This parametrization contains a diagonal scaling matrix
consisting of positive or negative one entries that can not be optimized by
gradient descent. Thus the scaling matrix is fixed before training and a
hyperparameter is introduced to tune the matrix for each particular task. In
this paper, we develop a unitary RNN architecture based on a complex scaled
Cayley transform. Unlike the real orthogonal case, the transformation uses a
diagonal scaling matrix consisting of entries on the complex unit circle which
can be optimized using gradient descent and no longer requires the tuning of a
hyperparameter. We also provide an analysis of a potential issue of the modReLU
activiation function which is used in our work and several other unitary RNNs.
In the experiments conducted, the scaled Cayley unitary recurrent neural
network (scuRNN) achieves comparable or better results than scoRNN and other
unitary RNNs without fixing the scaling matrix
Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics
A recent strategy to circumvent the exploding and vanishing gradient problem
in RNNs, and to allow the stable propagation of signals over long time scales,
is to constrain recurrent connectivity matrices to be orthogonal or unitary.
This ensures eigenvalues with unit norm and thus stable dynamics and training.
However this comes at the cost of reduced expressivity due to the limited
variety of orthogonal transformations. We propose a novel connectivity
structure based on the Schur decomposition and a splitting of the Schur form
into normal and non-normal parts. This allows to parametrize matrices with
unit-norm eigenspectra without orthogonality constraints on eigenbases. The
resulting architecture ensures access to a larger space of spectrally
constrained matrices, of which orthogonal matrices are a subset. This crucial
difference retains the stability advantages and training speed of orthogonal
RNNs while enhancing expressivity, especially on tasks that require
computations over ongoing input sequences
Householder-Absolute Neural Layers For High Variability and Deep Trainability
We propose a new architecture for artificial neural networks called
Householder-absolute neural layers, or Han-layers for short, that use
Householder reflectors as weight matrices and the absolute-value function for
activation. Han-layers, functioning as fully connected layers, are motivated by
recent results on neural-network variability and are designed to increase
activation ratio and reduce the chance of Collapse to Constants. Neural
networks constructed chiefly from Han-layers are called HanNets. By
construction, HanNets enjoy a theoretical guarantee that vanishing or exploding
gradient never occurs. We conduct several proof-of-concept experiments. Some
surprising results obtained on styled test problems suggest that, under certain
conditions, HanNets exhibit an unusual ability to produce nearly perfect
solutions unattainable by fully connected networks. Experiments on regression
datasets show that HanNets can significantly reduce the number of model
parameters while maintaining or improving the level of generalization accuracy.
In addition, by adding a few Han-layers into the pre-classification FC-layer of
a convolutional neural network, we are able to quickly improve a
state-of-the-art result on CIFAR10 dataset. These proof-of-concept results are
sufficient to necessitate further studies on HanNets to understand their
capacities and limits, and to exploit their potentials in real-world
applications
- …