230 research outputs found
Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network
Because of their effectiveness in broad practical applications, LSTM networks
have received a wealth of coverage in scientific journals, technical blogs, and
implementation guides. However, in most articles, the inference formulas for
the LSTM network and its parent, RNN, are stated axiomatically, while the
training formulas are omitted altogether. In addition, the technique of
"unrolling" an RNN is routinely presented without justification throughout the
literature. The goal of this paper is to explain the essential RNN and LSTM
fundamentals in a single document. Drawing from concepts in signal processing,
we formally derive the canonical RNN formulation from differential equations.
We then propose and prove a precise statement, which yields the RNN unrolling
technique. We also review the difficulties with training the standard RNN and
address them by transforming the RNN into the "Vanilla LSTM" network through a
series of logical arguments. We provide all equations pertaining to the LSTM
system together with detailed descriptions of its constituent entities. Albeit
unconventional, our choice of notation and the method for presenting the LSTM
system emphasizes ease of understanding. As part of the analysis, we identify
new opportunities to enrich the LSTM system and incorporate these extensions
into the Vanilla LSTM network, producing the most general LSTM variant to date.
The target reader has already been exposed to RNNs and LSTM networks through
numerous available resources and is open to an alternative pedagogical
approach. A Machine Learning practitioner seeking guidance for implementing our
new augmented LSTM model in software for experimentation and research will find
the insights and derivations in this tutorial valuable as well.Comment: 43 pages, 10 figures, 78 reference
Efficient machine learning: models and accelerations
One of the key enablers of the recent unprecedented success of machine learning is the adoption of very large models. Modern machine learning models typically consist of multiple cascaded layers such as deep neural networks, and at least millions to hundreds of millions of parameters (i.e., weights) for the entire model. The larger-scale model tend to enable the extraction of more complex high-level features, and therefore, lead to a significant improvement of the overall accuracy. On the other side, the layered deep structure and large model sizes also demand to increase computational capability and memory requirements. In order to achieve higher scalability, performance, and energy efficiency for deep learning systems, two orthogonal research and development trends have attracted enormous interests. The first trend is the acceleration while the second is the model compression. The underlying goal of these two trends is the high quality of the models to provides accurate predictions. In this thesis, we address these two problems and utilize different computing paradigms to solve real-life deep learning problems.
To explore in these two domains, this thesis first presents the cogent confabulation network for sentence completion problem. We use Chinese language as a case study to describe our exploration of the cogent confabulation based text recognition models. The exploration and optimization of the cogent confabulation based models have been conducted through various comparisons. The optimized network offered a better accuracy performance for the sentence completion. To accelerate the sentence completion problem in a multi-processing system, we propose a parallel framework for the confabulation recall algorithm. The parallel implementation reduce runtime, improve the recall accuracy by breaking the fixed evaluation order and introducing more generalization, and maintain a balanced progress in status update among all neurons. A lexicon scheduling algorithm is presented to further improve the model performance.
As deep neural networks have been proven effective to solve many real-life applications, and they are deployed on low-power devices, we then investigated the acceleration for the neural network inference using a hardware-friendly computing paradigm, stochastic computing. It is an approximate computing paradigm which requires small hardware footprint and achieves high energy efficiency. Applying this stochastic computing to deep convolutional neural networks, we design the functional hardware blocks and optimize them jointly to minimize the accuracy loss due to the approximation. The synthesis results show that the proposed design achieves the remarkable low hardware cost and power/energy consumption.
Modern neural networks usually imply a huge amount of parameters which cannot be fit into embedded devices. Compression of the deep learning models together with acceleration attracts our attention. We introduce the structured matrices based neural network to address this problem. Circulant matrix is one of the structured matrices, where a matrix can be represented using a single vector, so that the matrix is compressed. We further investigate a more flexible structure based on circulant matrix, called block-circulant matrix. It partitions a matrix into several smaller blocks and makes each submatrix is circulant. The compression ratio is controllable. With the help of Fourier Transform based equivalent computation, the inference of the deep neural network can be accelerated energy efficiently on the FPGAs. We also offer the optimization for the training algorithm for block circulant matrices based neural networks to obtain a high accuracy after compression
Algorithms for Finding Inverse of Two Patterned Matrices over Z
Circulant matrix families have become an important tool in network engineering. In this paper, two new patterned matrices over Zp which include row skew first-plus-last right circulant matrix and row first-plus-last left circulant matrix are presented. Their basic properties are discussed. Based on Newton-Hensel lifting and Chinese remaindering, two different algorithms are obtained. Moreover, the cost in terms of bit operations for each algorithm is given
Information rate maximization over a resistive grid
The work presents the first results of the authors research on adaptive cellular neural networks (CNN) based on a global information theoretic cost-function. It considers the simplest case of optimizing a resistive grid such that the Shannon information rate across the input-output boundaries of the grid is maximized. Besides its importance in information theory, information rate has been proven to be a useful concept for principal as well independent component analysis (PCA, ICA). In contrast to linear fully connected neural networks, resistive grids due to their local coupling can resemble models of physical media and are feasible for a VLSI implementation. Results for spatially invariant as well as for the spatially variant case are presented and their relation to principal subspace analysis (PSA) is outlined. Simulation results show the validity of the proposed results
- âŠ