9,817 research outputs found

    Cells in Multidimensional Recurrent Neural Networks

    Full text link
    The transcription of handwritten text on images is one task in machine learning and one solution to solve it is using multi-dimensional recurrent neural networks (MDRNN) with connectionist temporal classification (CTC). The RNNs can contain special units, the long short-term memory (LSTM) cells. They are able to learn long term dependencies but they get unstable when the dimension is chosen greater than one. We defined some useful and necessary properties for the one-dimensional LSTM cell and extend them in the multi-dimensional case. Thereby we introduce several new cells with better stability. We present a method to design cells using the theory of linear shift invariant systems. The new cells are compared to the LSTM cell on the IFN/ENIT and Rimes database, where we can improve the recognition rate compared to the LSTM cell. So each application where the LSTM cells in MDRNNs are used could be improved by substituting them by the new developed cells

    Grid Long Short-Term Memory

    Full text link
    This paper introduces Grid Long Short-Term Memory, a network of LSTM cells arranged in a multidimensional grid that can be applied to vectors, sequences or higher dimensional data such as images. The network differs from existing deep LSTM architectures in that the cells are connected between network layers as well as along the spatiotemporal dimensions of the data. The network provides a unified way of using LSTM for both deep and sequential computation. We apply the model to algorithmic tasks such as 15-digit integer addition and sequence memorization, where it is able to significantly outperform the standard LSTM. We then give results for two empirical tasks. We find that 2D Grid LSTM achieves 1.47 bits per character on the Wikipedia character prediction benchmark, which is state-of-the-art among neural approaches. In addition, we use the Grid LSTM to define a novel two-dimensional translation model, the Reencoder, and show that it outperforms a phrase-based reference system on a Chinese-to-English translation task.Comment: 15 page

    Hessian-free Optimization for Learning Deep Multidimensional Recurrent Neural Networks

    Full text link
    Multidimensional recurrent neural networks (MDRNNs) have shown a remarkable performance in the area of speech and handwriting recognition. The performance of an MDRNN is improved by further increasing its depth, and the difficulty of learning the deeper network is overcome by using Hessian-free (HF) optimization. Given that connectionist temporal classification (CTC) is utilized as an objective of learning an MDRNN for sequence labeling, the non-convexity of CTC poses a problem when applying HF to the network. As a solution, a convex approximation of CTC is formulated and its relationship with the EM algorithm and the Fisher information matrix is discussed. An MDRNN up to a depth of 15 layers is successfully trained using HF, resulting in an improved performance for sequence labeling.Comment: to appear at NIPS 201

    Visual Reasoning of Feature Attribution with Deep Recurrent Neural Networks

    Full text link
    Deep Recurrent Neural Network (RNN) has gained popularity in many sequence classification tasks. Beyond predicting a correct class for each data instance, data scientists also want to understand what differentiating factors in the data have contributed to the classification during the learning process. We present a visual analytics approach to facilitate this task by revealing the RNN attention for all data instances, their temporal positions in the sequences, and the attribution of variables at each value level. We demonstrate with real-world datasets that our approach can help data scientists to understand such dynamics in deep RNNs from the training results, hence guiding their modeling process

    Machine Learning Phase Transition: An Iterative Proposal

    Full text link
    We propose an iterative proposal to estimate critical points for statistical models based on configurations by combing machine-learning tools. Firstly, phase scenarios and preliminary boundaries of phases are obtained by dimensionality-reduction techniques. Besides, this step not only provides labelled samples for the subsequent step but also is necessary for its application to novel statistical models. Secondly, making use of these samples as training set, neural networks are employed to assign labels to those samples between the phase boundaries in an iterative manner. Newly labelled samples would be put in the training set used in subsequent training and the phase boundaries would be updated as well. The average of the phase boundaries is expected to converge to the critical temperature in this proposal. In concrete examples, we implement this proposal to estimate the critical temperatures for two q-state Potts models with continuous and first order phase transitions. Linear and manifold dimensionality-reduction techniques are employed in the first step. Both a convolutional neural network and a bidirectional recurrent neural network with long short-term memory units perform well for two Potts models in the second step. The convergent behaviors of the estimations reflect the types of phase transitions. And the results indicate that our proposal may be used to explore phase transitions for new general statistical models.Comment: We focus on the iterative strategy but not the concrete tools like specific dimension-reduction techniques, CNN and BLSTM in this work. Other machine-learning tools with similar functions may be applied to new statistical models with this proposa

    Learning Deep Matrix Representations

    Full text link
    We present a new distributed representation in deep neural nets wherein the information is represented in native form as a matrix. This differs from current neural architectures that rely on vector representations. We consider matrices as central to the architecture and they compose the input, hidden and output layers. The model representation is more compact and elegant -- the number of parameters grows only with the largest dimension of the incoming layer rather than the number of hidden units. We derive several new deep networks: (i) feed-forward nets that map an input matrix into an output matrix, (ii) recurrent nets which map a sequence of input matrices into a sequence of output matrices. We also reinterpret existing models for (iii) memory-augmented networks and (iv) graphs using matrix notations. For graphs we demonstrate how the new notations lead to simple but effective extensions with multiple attentions. Extensive experiments on handwritten digits recognition, face reconstruction, sequence to sequence learning, EEG classification, and graph-based node classification demonstrate the efficacy and compactness of the matrix architectures

    Learning Over Long Time Lags

    Full text link
    The advantage of recurrent neural networks (RNNs) in learning dependencies between time-series data has distinguished RNNs from other deep learning models. Recently, many advances are proposed in this emerging field. However, there is a lack of comprehensive review on memory models in RNNs in the literature. This paper provides a fundamental review on RNNs and long short term memory (LSTM) model. Then, provides a surveys of recent advances in different memory enhancements and learning techniques for capturing long term dependencies in RNNs.Comment: This is a draft article, in preparation to submit for peer-revie

    Towards Two-Dimensional Sequence to Sequence Model in Neural Machine Translation

    Full text link
    This work investigates an alternative model for neural machine translation (NMT) and proposes a novel architecture, where we employ a multi-dimensional long short-term memory (MDLSTM) for translation modeling. In the state-of-the-art methods, source and target sentences are treated as one-dimensional sequences over time, while we view translation as a two-dimensional (2D) mapping using an MDLSTM layer to define the correspondence between source and target words. We extend beyond the current sequence to sequence backbone NMT models to a 2D structure in which the source and target sentences are aligned with each other in a 2D grid. Our proposed topology shows consistent improvements over attention-based sequence to sequence model on two WMT 2017 tasks, German↔\leftrightarrowEnglish.Comment: 7 pages, EMNLP 201

    On Recurrent Neural Networks for Sequence-based Processing in Communications

    Full text link
    In this work, we analyze the capabilities and practical limitations of neural networks (NNs) for sequence-based signal processing which can be seen as an omnipresent property in almost any modern communication systems. In particular, we train multiple state-of-the-art recurrent neural network (RNN) structures to learn how to decode convolutional codes allowing a clear benchmarking with the corresponding maximum likelihood (ML) Viterbi decoder. We examine the decoding performance for various kinds of NN architectures, beginning with classical types like feedforward layers and gated recurrent unit (GRU)-layers, up to more recently introduced architectures such as temporal convolutional networks (TCNs) and differentiable neural computers (DNCs) with external memory. As a key limitation, it turns out that the training complexity increases exponentially with the length of the encoding memory ν\nu and, thus, practically limits the achievable bit error rate (BER) performance. To overcome this limitation, we introduce a new training-method by gradually increasing the number of ones within the training sequences, i.e., we constrain the amount of possible training sequences in the beginning until first convergence. By consecutively adding more and more possible sequences to the training set, we finally achieve training success in cases that did not converge before via naive training. Further, we show that our network can learn to jointly detect and decode a quadrature phase shift keying (QPSK) modulated code with sub-optimal (anti-Gray) labeling in one-shot at a performance that would require iterations between demapper and decoder in classic detection schemes.Comment: Presented at Asilomar Conf. 201

    Sparse incomplete representations: A novel role for olfactory granule cells

    Full text link
    Mitral cells of the olfactory bulb form sparse representations of the odorants and transmit this information to the cortex. The olfactory code carried by the mitral cells is sparser than the inputs that they receive. In this study we analyze the mechanisms and functional significance of sparse olfactory codes. We consider a model of olfactory bulb containing populations of excitatory mitral and inhibitory granule cells. We argue that sparse codes may emerge as a result of self organization in the network leading to the precise balance between mitral cells' excitatory inputs and inhibition provided by the granule cells. We propose a novel role for the olfactory granule cells. We show that these cells can build representations of odorant stimuli that are not fully accurate. Due to the incompleteness in the granule cell representation, the exact excitation-inhibition balance is established only for some mitral cells leading to sparse responses of the mitral cell. Our model suggests a functional significance to the dendrodendritic synapses that mediate interactions between mitral and granule cells. The model accounts for the sparse olfactory code in the steady state and predicts that transient dynamics may be less sparse
    • …
    corecore