9,817 research outputs found
Cells in Multidimensional Recurrent Neural Networks
The transcription of handwritten text on images is one task in machine
learning and one solution to solve it is using multi-dimensional recurrent
neural networks (MDRNN) with connectionist temporal classification (CTC). The
RNNs can contain special units, the long short-term memory (LSTM) cells. They
are able to learn long term dependencies but they get unstable when the
dimension is chosen greater than one. We defined some useful and necessary
properties for the one-dimensional LSTM cell and extend them in the
multi-dimensional case. Thereby we introduce several new cells with better
stability. We present a method to design cells using the theory of linear shift
invariant systems. The new cells are compared to the LSTM cell on the IFN/ENIT
and Rimes database, where we can improve the recognition rate compared to the
LSTM cell. So each application where the LSTM cells in MDRNNs are used could be
improved by substituting them by the new developed cells
Grid Long Short-Term Memory
This paper introduces Grid Long Short-Term Memory, a network of LSTM cells
arranged in a multidimensional grid that can be applied to vectors, sequences
or higher dimensional data such as images. The network differs from existing
deep LSTM architectures in that the cells are connected between network layers
as well as along the spatiotemporal dimensions of the data. The network
provides a unified way of using LSTM for both deep and sequential computation.
We apply the model to algorithmic tasks such as 15-digit integer addition and
sequence memorization, where it is able to significantly outperform the
standard LSTM. We then give results for two empirical tasks. We find that 2D
Grid LSTM achieves 1.47 bits per character on the Wikipedia character
prediction benchmark, which is state-of-the-art among neural approaches. In
addition, we use the Grid LSTM to define a novel two-dimensional translation
model, the Reencoder, and show that it outperforms a phrase-based reference
system on a Chinese-to-English translation task.Comment: 15 page
Hessian-free Optimization for Learning Deep Multidimensional Recurrent Neural Networks
Multidimensional recurrent neural networks (MDRNNs) have shown a remarkable
performance in the area of speech and handwriting recognition. The performance
of an MDRNN is improved by further increasing its depth, and the difficulty of
learning the deeper network is overcome by using Hessian-free (HF)
optimization. Given that connectionist temporal classification (CTC) is
utilized as an objective of learning an MDRNN for sequence labeling, the
non-convexity of CTC poses a problem when applying HF to the network. As a
solution, a convex approximation of CTC is formulated and its relationship with
the EM algorithm and the Fisher information matrix is discussed. An MDRNN up to
a depth of 15 layers is successfully trained using HF, resulting in an improved
performance for sequence labeling.Comment: to appear at NIPS 201
Visual Reasoning of Feature Attribution with Deep Recurrent Neural Networks
Deep Recurrent Neural Network (RNN) has gained popularity in many sequence
classification tasks. Beyond predicting a correct class for each data instance,
data scientists also want to understand what differentiating factors in the
data have contributed to the classification during the learning process. We
present a visual analytics approach to facilitate this task by revealing the
RNN attention for all data instances, their temporal positions in the
sequences, and the attribution of variables at each value level. We demonstrate
with real-world datasets that our approach can help data scientists to
understand such dynamics in deep RNNs from the training results, hence guiding
their modeling process
Machine Learning Phase Transition: An Iterative Proposal
We propose an iterative proposal to estimate critical points for statistical
models based on configurations by combing machine-learning tools. Firstly,
phase scenarios and preliminary boundaries of phases are obtained by
dimensionality-reduction techniques. Besides, this step not only provides
labelled samples for the subsequent step but also is necessary for its
application to novel statistical models. Secondly, making use of these samples
as training set, neural networks are employed to assign labels to those samples
between the phase boundaries in an iterative manner. Newly labelled samples
would be put in the training set used in subsequent training and the phase
boundaries would be updated as well. The average of the phase boundaries is
expected to converge to the critical temperature in this proposal. In concrete
examples, we implement this proposal to estimate the critical temperatures for
two q-state Potts models with continuous and first order phase transitions.
Linear and manifold dimensionality-reduction techniques are employed in the
first step. Both a convolutional neural network and a bidirectional recurrent
neural network with long short-term memory units perform well for two Potts
models in the second step. The convergent behaviors of the estimations reflect
the types of phase transitions. And the results indicate that our proposal may
be used to explore phase transitions for new general statistical models.Comment: We focus on the iterative strategy but not the concrete tools like
specific dimension-reduction techniques, CNN and BLSTM in this work. Other
machine-learning tools with similar functions may be applied to new
statistical models with this proposa
Learning Deep Matrix Representations
We present a new distributed representation in deep neural nets wherein the
information is represented in native form as a matrix. This differs from
current neural architectures that rely on vector representations. We consider
matrices as central to the architecture and they compose the input, hidden and
output layers. The model representation is more compact and elegant -- the
number of parameters grows only with the largest dimension of the incoming
layer rather than the number of hidden units. We derive several new deep
networks: (i) feed-forward nets that map an input matrix into an output matrix,
(ii) recurrent nets which map a sequence of input matrices into a sequence of
output matrices. We also reinterpret existing models for (iii) memory-augmented
networks and (iv) graphs using matrix notations. For graphs we demonstrate how
the new notations lead to simple but effective extensions with multiple
attentions. Extensive experiments on handwritten digits recognition, face
reconstruction, sequence to sequence learning, EEG classification, and
graph-based node classification demonstrate the efficacy and compactness of the
matrix architectures
Learning Over Long Time Lags
The advantage of recurrent neural networks (RNNs) in learning dependencies
between time-series data has distinguished RNNs from other deep learning
models. Recently, many advances are proposed in this emerging field. However,
there is a lack of comprehensive review on memory models in RNNs in the
literature. This paper provides a fundamental review on RNNs and long short
term memory (LSTM) model. Then, provides a surveys of recent advances in
different memory enhancements and learning techniques for capturing long term
dependencies in RNNs.Comment: This is a draft article, in preparation to submit for peer-revie
Towards Two-Dimensional Sequence to Sequence Model in Neural Machine Translation
This work investigates an alternative model for neural machine translation
(NMT) and proposes a novel architecture, where we employ a multi-dimensional
long short-term memory (MDLSTM) for translation modeling. In the
state-of-the-art methods, source and target sentences are treated as
one-dimensional sequences over time, while we view translation as a
two-dimensional (2D) mapping using an MDLSTM layer to define the correspondence
between source and target words. We extend beyond the current sequence to
sequence backbone NMT models to a 2D structure in which the source and target
sentences are aligned with each other in a 2D grid. Our proposed topology shows
consistent improvements over attention-based sequence to sequence model on two
WMT 2017 tasks, GermanEnglish.Comment: 7 pages, EMNLP 201
On Recurrent Neural Networks for Sequence-based Processing in Communications
In this work, we analyze the capabilities and practical limitations of neural
networks (NNs) for sequence-based signal processing which can be seen as an
omnipresent property in almost any modern communication systems. In particular,
we train multiple state-of-the-art recurrent neural network (RNN) structures to
learn how to decode convolutional codes allowing a clear benchmarking with the
corresponding maximum likelihood (ML) Viterbi decoder. We examine the decoding
performance for various kinds of NN architectures, beginning with classical
types like feedforward layers and gated recurrent unit (GRU)-layers, up to more
recently introduced architectures such as temporal convolutional networks
(TCNs) and differentiable neural computers (DNCs) with external memory. As a
key limitation, it turns out that the training complexity increases
exponentially with the length of the encoding memory and, thus,
practically limits the achievable bit error rate (BER) performance. To overcome
this limitation, we introduce a new training-method by gradually increasing the
number of ones within the training sequences, i.e., we constrain the amount of
possible training sequences in the beginning until first convergence. By
consecutively adding more and more possible sequences to the training set, we
finally achieve training success in cases that did not converge before via
naive training. Further, we show that our network can learn to jointly detect
and decode a quadrature phase shift keying (QPSK) modulated code with
sub-optimal (anti-Gray) labeling in one-shot at a performance that would
require iterations between demapper and decoder in classic detection schemes.Comment: Presented at Asilomar Conf. 201
Sparse incomplete representations: A novel role for olfactory granule cells
Mitral cells of the olfactory bulb form sparse representations of the
odorants and transmit this information to the cortex. The olfactory code
carried by the mitral cells is sparser than the inputs that they receive. In
this study we analyze the mechanisms and functional significance of sparse
olfactory codes. We consider a model of olfactory bulb containing populations
of excitatory mitral and inhibitory granule cells. We argue that sparse codes
may emerge as a result of self organization in the network leading to the
precise balance between mitral cells' excitatory inputs and inhibition provided
by the granule cells. We propose a novel role for the olfactory granule cells.
We show that these cells can build representations of odorant stimuli that are
not fully accurate. Due to the incompleteness in the granule cell
representation, the exact excitation-inhibition balance is established only for
some mitral cells leading to sparse responses of the mitral cell. Our model
suggests a functional significance to the dendrodendritic synapses that mediate
interactions between mitral and granule cells. The model accounts for the
sparse olfactory code in the steady state and predicts that transient dynamics
may be less sparse
- …