2,416 research outputs found
Joint Learning of Correlated Sequence Labelling Tasks Using Bidirectional Recurrent Neural Networks
The stream of words produced by Automatic Speech Recognition (ASR) systems is
typically devoid of punctuations and formatting. Most natural language
processing applications expect segmented and well-formatted texts as input,
which is not available in ASR output. This paper proposes a novel technique of
jointly modeling multiple correlated tasks such as punctuation and
capitalization using bidirectional recurrent neural networks, which leads to
improved performance for each of these tasks. This method could be extended for
joint modeling of any other correlated sequence labeling tasks.Comment: Accepted in Interspeech 201
Label-Dependencies Aware Recurrent Neural Networks
In the last few years, Recurrent Neural Networks (RNNs) have proved effective
on several NLP tasks. Despite such great success, their ability to model
\emph{sequence labeling} is still limited. This lead research toward solutions
where RNNs are combined with models which already proved effective in this
domain, such as CRFs. In this work we propose a solution far simpler but very
effective: an evolution of the simple Jordan RNN, where labels are re-injected
as input into the network, and converted into embeddings, in the same way as
words. We compare this RNN variant to all the other RNN models, Elman and
Jordan RNN, LSTM and GRU, on two well-known tasks of Spoken Language
Understanding (SLU). Thanks to label embeddings and their combination at the
hidden layer, the proposed variant, which uses more parameters than Elman and
Jordan RNNs, but far fewer than LSTM and GRU, is more effective than other
RNNs, but also outperforms sophisticated CRF models.Comment: 22 pages, 3 figures. Accepted at CICling 2017 conference. Best
Verifiability, Reproducibility, and Working Description awar
Deep Cascade Multi-task Learning for Slot Filling in Online Shopping Assistant
Slot filling is a critical task in natural language understanding (NLU) for
dialog systems. State-of-the-art approaches treat it as a sequence labeling
problem and adopt such models as BiLSTM-CRF. While these models work relatively
well on standard benchmark datasets, they face challenges in the context of
E-commerce where the slot labels are more informative and carry richer
expressions. In this work, inspired by the unique structure of E-commerce
knowledge base, we propose a novel multi-task model with cascade and residual
connections, which jointly learns segment tagging, named entity tagging and
slot filling. Experiments show the effectiveness of the proposed cascade and
residual structures. Our model has a 14.6% advantage in F1 score over the
strong baseline methods on a new Chinese E-commerce shopping assistant dataset,
while achieving competitive accuracies on a standard dataset. Furthermore,
online test deployed on such dominant E-commerce platform shows 130%
improvement on accuracy of understanding user utterances. Our model has already
gone into production in the E-commerce platform.Comment: AAAI 201
Twin Networks: Matching the Future for Sequence Generation
We propose a simple technique for encouraging generative RNNs to plan ahead.
We train a "backward" recurrent network to generate a given sequence in reverse
order, and we encourage states of the forward model to predict cotemporal
states of the backward model. The backward network is used only during
training, and plays no role during sampling or inference. We hypothesize that
our approach eases modeling of long-term dependencies by implicitly forcing the
forward states to hold information about the longer-term future (as contained
in the backward states). We show empirically that our approach achieves 9%
relative improvement for a speech recognition task, and achieves significant
improvement on a COCO caption generation task.Comment: 12 pages, 3 figures, published at ICLR 201
Simple Recurrent Units for Highly Parallelizable Recurrence
Common recurrent neural architectures scale poorly due to the intrinsic
difficulty in parallelizing their state computations. In this work, we propose
the Simple Recurrent Unit (SRU), a light recurrent unit that balances model
capacity and scalability. SRU is designed to provide expressive recurrence,
enable highly parallelized implementation, and comes with careful
initialization to facilitate training of deep models. We demonstrate the
effectiveness of SRU on multiple NLP tasks. SRU achieves 5--9x speed-up over
cuDNN-optimized LSTM on classification and question answering datasets, and
delivers stronger results than LSTM and convolutional models. We also obtain an
average of 0.7 BLEU improvement over the Transformer model on translation by
incorporating SRU into the architecture.Comment: EMNL
- …