3,026 research outputs found
Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics
A recent strategy to circumvent the exploding and vanishing gradient problem
in RNNs, and to allow the stable propagation of signals over long time scales,
is to constrain recurrent connectivity matrices to be orthogonal or unitary.
This ensures eigenvalues with unit norm and thus stable dynamics and training.
However this comes at the cost of reduced expressivity due to the limited
variety of orthogonal transformations. We propose a novel connectivity
structure based on the Schur decomposition and a splitting of the Schur form
into normal and non-normal parts. This allows to parametrize matrices with
unit-norm eigenspectra without orthogonality constraints on eigenbases. The
resulting architecture ensures access to a larger space of spectrally
constrained matrices, of which orthogonal matrices are a subset. This crucial
difference retains the stability advantages and training speed of orthogonal
RNNs while enhancing expressivity, especially on tasks that require
computations over ongoing input sequences
Learning Unitary Operators with Help From u(n)
A major challenge in the training of recurrent neural networks is the
so-called vanishing or exploding gradient problem. The use of a norm-preserving
transition operator can address this issue, but parametrization is challenging.
In this work we focus on unitary operators and describe a parametrization using
the Lie algebra associated with the Lie group of unitary matrices. The exponential map provides a correspondence
between these spaces, and allows us to define a unitary matrix using real
coefficients relative to a basis of the Lie algebra. The parametrization is
closed under additive updates of these coefficients, and thus provides a simple
space in which to do gradient descent. We demonstrate the effectiveness of this
parametrization on the problem of learning arbitrary unitary operators,
comparing to several baselines and outperforming a recently-proposed
lower-dimensional parametrization. We additionally use our parametrization to
generalize a recently-proposed unitary recurrent neural network to arbitrary
unitary matrices, using it to solve standard long-memory tasks.Comment: 9 pages, 3 figures, 5 figures inc. subfigures, to appear at AAAI-1
Using Regular Languages to Explore the Representational Capacity of Recurrent Neural Architectures
The presence of Long Distance Dependencies (LDDs) in sequential data poses
significant challenges for computational models. Various recurrent neural
architectures have been designed to mitigate this issue. In order to test these
state-of-the-art architectures, there is growing need for rich benchmarking
datasets. However, one of the drawbacks of existing datasets is the lack of
experimental control with regards to the presence and/or degree of LDDs. This
lack of control limits the analysis of model performance in relation to the
specific challenge posed by LDDs. One way to address this is to use synthetic
data having the properties of subregular languages. The degree of LDDs within
the generated data can be controlled through the k parameter, length of the
generated strings, and by choosing appropriate forbidden strings. In this
paper, we explore the capacity of different RNN extensions to model LDDs, by
evaluating these models on a sequence of SPk synthesized datasets, where each
subsequent dataset exhibits a longer degree of LDD. Even though SPk are simple
languages, the presence of LDDs does have significant impact on the performance
of recurrent neural architectures, thus making them prime candidate in
benchmarking tasks.Comment: International Conference of Artificial Neural Networks (ICANN) 201
Improving speech recognition by revising gated recurrent units
Speech recognition is largely taking advantage of deep learning, showing that
substantial benefits can be obtained by modern Recurrent Neural Networks
(RNNs). The most popular RNNs are Long Short-Term Memory (LSTMs), which
typically reach state-of-the-art performance in many tasks thanks to their
ability to learn long-term dependencies and robustness to vanishing gradients.
Nevertheless, LSTMs have a rather complex design with three multiplicative
gates, that might impair their efficient implementation. An attempt to simplify
LSTMs has recently led to Gated Recurrent Units (GRUs), which are based on just
two multiplicative gates.
This paper builds on these efforts by further revising GRUs and proposing a
simplified architecture potentially more suitable for speech recognition. The
contribution of this work is two-fold. First, we suggest to remove the reset
gate in the GRU design, resulting in a more efficient single-gate architecture.
Second, we propose to replace tanh with ReLU activations in the state update
equations. Results show that, in our implementation, the revised architecture
reduces the per-epoch training time with more than 30% and consistently
improves recognition performance across different tasks, input features, and
noisy conditions when compared to a standard GRU
- …