209,843 research outputs found
Recurrent Attention Unit
Recurrent Neural Network (RNN) has been successfully applied in many sequence
learning problems. Such as handwriting recognition, image description, natural
language processing and video motion analysis. After years of development,
researchers have improved the internal structure of the RNN and introduced many
variants. Among others, Gated Recurrent Unit (GRU) is one of the most widely
used RNN model. However, GRU lacks the capability of adaptively paying
attention to certain regions or locations, so that it may cause information
redundancy or loss during leaning. In this paper, we propose a RNN model,
called Recurrent Attention Unit (RAU), which seamlessly integrates the
attention mechanism into the interior of GRU by adding an attention gate. The
attention gate can enhance GRU's ability to remember long-term memory and help
memory cells quickly discard unimportant content. RAU is capable of extracting
information from the sequential data by adaptively selecting a sequence of
regions or locations and pay more attention to the selected regions during
learning. Extensive experiments on image classification, sentiment
classification and language modeling show that RAU consistently outperforms GRU
and other baseline methods
Efficiently applying attention to sequential data with the Recurrent Discounted Attention unit
Recurrent Neural Networks architectures excel at processing sequences by
modelling dependencies over different timescales. The recently introduced
Recurrent Weighted Average (RWA) unit captures long term dependencies far
better than an LSTM on several challenging tasks. The RWA achieves this by
applying attention to each input and computing a weighted average over the full
history of its computations. Unfortunately, the RWA cannot change the attention
it has assigned to previous timesteps, and so struggles with carrying out
consecutive tasks or tasks with changing requirements. We present the Recurrent
Discounted Attention (RDA) unit that builds on the RWA by additionally allowing
the discounting of the past.
We empirically compare our model to RWA, LSTM and GRU units on several
challenging tasks. On tasks with a single output the RWA, RDA and GRU units
learn much quicker than the LSTM and with better performance. On the multiple
sequence copy task our RDA unit learns the task three times as quickly as the
LSTM or GRU units while the RWA fails to learn at all. On the Wikipedia
character prediction task the LSTM performs best but it followed closely by our
RDA unit. Overall our RDA unit performs well and is sample efficient on a large
variety of sequence tasks.Comment: Updated results of RDA-exp-tanh unit for the wikipedia char
prediction tas
DRAGNN: A Transition-based Framework for Dynamically Connected Neural Networks
In this work, we present a compact, modular framework for constructing novel
recurrent neural architectures. Our basic module is a new generic unit, the
Transition Based Recurrent Unit (TBRU). In addition to hidden layer
activations, TBRUs have discrete state dynamics that allow network connections
to be built dynamically as a function of intermediate activations. By
connecting multiple TBRUs, we can extend and combine commonly used
architectures such as sequence-to-sequence, attention mechanisms, and
re-cursive tree-structured models. A TBRU can also serve as both an encoder for
downstream tasks and as a decoder for its own task simultaneously, resulting in
more accurate multi-task learning. We call our approach Dynamic Recurrent
Acyclic Graphical Neural Networks, or DRAGNN. We show that DRAGNN is
significantly more accurate and efficient than seq2seq with attention for
syntactic dependency parsing and yields more accurate multi-task learning for
extractive summarization tasks.Comment: 10 pages; Submitted for review to ACL201
Benchmarking Deep Learning Architectures for Predicting Readmission to the ICU and Describing Patients-at-Risk
Objective: To compare different deep learning architectures for predicting
the risk of readmission within 30 days of discharge from the intensive care
unit (ICU). The interpretability of attention-based models is leveraged to
describe patients-at-risk. Methods: Several deep learning architectures making
use of attention mechanisms, recurrent layers, neural ordinary differential
equations (ODEs), and medical concept embeddings with time-aware attention were
trained using publicly available electronic medical record data (MIMIC-III)
associated with 45,298 ICU stays for 33,150 patients. Bayesian inference was
used to compute the posterior over weights of an attention-based model. Odds
ratios associated with an increased risk of readmission were computed for
static variables. Diagnoses, procedures, medications, and vital signs were
ranked according to the associated risk of readmission. Results: A recurrent
neural network, with time dynamics of code embeddings computed by neural ODEs,
achieved the highest average precision of 0.331 (AUROC: 0.739, F1-Score:
0.372). Predictive accuracy was comparable across neural network architectures.
Groups of patients at risk included those suffering from infectious
complications, with chronic or progressive conditions, and for whom standard
medical care was not suitable. Conclusions: Attention-based networks may be
preferable to recurrent networks if an interpretable model is required, at only
marginal cost in predictive accuracy
Training Recurrent Answering Units with Joint Loss Minimization for VQA
We propose a novel algorithm for visual question answering based on a
recurrent deep neural network, where every module in the network corresponds to
a complete answering unit with attention mechanism by itself. The network is
optimized by minimizing loss aggregated from all the units, which share model
parameters while receiving different information to compute attention
probability. For training, our model attends to a region within image feature
map, updates its memory based on the question and attended image feature, and
answers the question based on its memory state. This procedure is performed to
compute loss in each step. The motivation of this approach is our observation
that multi-step inferences are often required to answer questions while each
problem may have a unique desirable number of steps, which is difficult to
identify in practice. Hence, we always make the first unit in the network solve
problems, but allow it to learn the knowledge from the rest of units by
backpropagation unless it degrades the model. To implement this idea, we
early-stop training each unit as soon as it starts to overfit. Note that, since
more complex models tend to overfit on easier questions quickly, the last
answering unit in the unfolded recurrent neural network is typically killed
first while the first one remains last. We make a single-step prediction for a
new question using the shared model. This strategy works better than the other
options within our framework since the selected model is trained effectively
from all units without overfitting. The proposed algorithm outperforms other
multi-step attention based approaches using a single step prediction in VQA
dataset
A GRU-based Encoder-Decoder Approach with Attention for Online Handwritten Mathematical Expression Recognition
In this study, we present a novel end-to-end approach based on the
encoder-decoder framework with the attention mechanism for online handwritten
mathematical expression recognition (OHMER). First, the input two-dimensional
ink trajectory information of handwritten expression is encoded via the gated
recurrent unit based recurrent neural network (GRU-RNN). Then the decoder is
also implemented by the GRU-RNN with a coverage-based attention model. The
proposed approach can simultaneously accomplish the symbol recognition and
structural analysis to output a character sequence in LaTeX format. Validated
on the CROHME 2014 competition task, our approach significantly outperforms the
state-of-the-art with an expression recognition accuracy of 52.43% by only
using the official training dataset. Furthermore, the alignments between the
input trajectories of handwritten expressions and the output LaTeX sequences
are visualized by the attention mechanism to show the effectiveness of the
proposed method.Comment: Accepted by ICDAR 2017 conferenc
Dense Recurrent Neural Networks for Scene Labeling
Recently recurrent neural networks (RNNs) have demonstrated the ability to
improve scene labeling through capturing long-range dependencies among image
units. In this paper, we propose dense RNNs for scene labeling by exploring
various long-range semantic dependencies among image units. In comparison with
existing RNN based approaches, our dense RNNs are able to capture richer
contextual dependencies for each image unit via dense connections between each
pair of image units, which significantly enhances their discriminative power.
Besides, to select relevant and meanwhile restrain irrelevant dependencies for
each unit from dense connections, we introduce an attention model into dense
RNNs. The attention model enables automatically assigning more importance to
helpful dependencies while less weight to unconcerned dependencies. Integrating
with convolutional neural networks (CNNs), our method achieves state-of-the-art
performances on the PASCAL Context, MIT ADE20K and SiftFlow benchmarks.Comment: Tech. Repor
On Extended Long Short-term Memory and Dependent Bidirectional Recurrent Neural Network
In this work, we first analyze the memory behavior in three recurrent neural
networks (RNN) cells; namely, the simple RNN (SRN), the long short-term memory
(LSTM) and the gated recurrent unit (GRU), where the memory is defined as a
function that maps previous elements in a sequence to the current output. Our
study shows that all three of them suffer rapid memory decay. Then, to
alleviate this effect, we introduce trainable scaling factors that act like an
attention mechanism to adjust memory decay adaptively. The new design is called
the extended LSTM (ELSTM). Finally, to design a system that is robust to
previous erroneous predictions, we propose a dependent bidirectional recurrent
neural network (DBRNN). Extensive experiments are conducted on different
language tasks to demonstrate the superiority of the proposed ELSTM and DBRNN
solutions. The ELTSM has achieved up to 30% increase in the labeled attachment
score (LAS) as compared to LSTM and GRU in the dependency parsing (DP) task.
Our models also outperform other state-of-the-art models such as bi-attention
and convolutional sequence to sequence (convseq2seq) by close to 10% in the
LAS. The code is released as an open source
(https://github.com/yuanhangsu/ELSTM-DBRNN)Comment: github repo: https://github.com/yuanhangsu/ELSTM-DBRN
Channel Recurrent Attention Networks for Video Pedestrian Retrieval
Full attention, which generates an attention value per element of the input
feature maps, has been successfully demonstrated to be beneficial in visual
tasks. In this work, we propose a fully attentional network, termed {\it
channel recurrent attention network}, for the task of video pedestrian
retrieval. The main attention unit, \textit{channel recurrent attention},
identifies attention maps at the frame level by jointly leveraging spatial and
channel patterns via a recurrent neural network. This channel recurrent
attention is designed to build a global receptive field by recurrently
receiving and learning the spatial vectors. Then, a \textit{set aggregation}
cell is employed to generate a compact video representation. Empirical
experimental results demonstrate the superior performance of the proposed deep
network, outperforming current state-of-the-art results across standard video
person retrieval benchmarks, and a thorough ablation study shows the
effectiveness of the proposed units.Comment: To appear in ACCV 202
Neural Networks for Text Correction and Completion in Keyboard Decoding
Despite the ubiquity of mobile and wearable text messaging applications, the
problem of keyboard text decoding is not tackled sufficiently in the light of
the enormous success of the deep learning Recurrent Neural Network (RNN) and
Convolutional Neural Networks (CNN) for natural language understanding. In
particular, considering that the keyboard decoders should operate on devices
with memory and processor resource constraints, makes it challenging to deploy
industrial scale deep neural network (DNN) models. This paper proposes a
sequence-to-sequence neural attention network system for automatic text
correction and completion. Given an erroneous sequence, our model encodes
character level hidden representations and then decodes the revised sequence
thus enabling auto-correction and completion. We achieve this by a combination
of character level CNN and gated recurrent unit (GRU) encoder along with and a
word level gated recurrent unit (GRU) attention decoder. Unlike traditional
language models that learn from billions of words, our corpus size is only 12
million words; an order of magnitude smaller. The memory footprint of our
learnt model for inference and prediction is also an order of magnitude smaller
than the conventional language model based text decoders. We report baseline
performance for neural keyboard decoders in such limited domain. Our models
achieve a word level accuracy of and a character error rate CER of
over the Twitter typo dataset. We present a novel dataset of noisy to
corrected mappings by inducing the noise distribution from the Twitter data
over the OpenSubtitles 2009 dataset; on which our model predicts with a word
level accuracy of and sequence accuracy of . In our user study,
our model achieved an average CER of with the state-of-the-art
non-neural touch-screen keyboard decoder at CER of
- …