5,927 research outputs found
When Are Tree Structures Necessary for Deep Learning of Representations?
Recursive neural models, which use syntactic parse trees to recursively
generate representations bottom-up, are a popular architecture. But there have
not been rigorous evaluations showing for exactly which tasks this syntax-based
method is appropriate. In this paper we benchmark {\bf recursive} neural models
against sequential {\bf recurrent} neural models (simple recurrent and LSTM
models), enforcing apples-to-apples comparison as much as possible. We
investigate 4 tasks: (1) sentiment classification at the sentence level and
phrase level; (2) matching questions to answer-phrases; (3) discourse parsing;
(4) semantic relation extraction (e.g., {\em component-whole} between nouns).
Our goal is to understand better when, and why, recursive models can
outperform simpler models. We find that recursive models help mainly on tasks
(like semantic relation extraction) that require associating headwords across a
long distance, particularly on very long sequences. We then introduce a method
for allowing recurrent models to achieve similar performance: breaking long
sentences into clause-like units at punctuation and processing them separately
before combining. Our results thus help understand the limitations of both
classes of models, and suggest directions for improving recurrent models
Learning to Skim Text
Recurrent Neural Networks are showing much promise in many sub-areas of
natural language processing, ranging from document classification to machine
translation to automatic question answering. Despite their promise, many
recurrent models have to read the whole text word by word, making it slow to
handle long documents. For example, it is difficult to use a recurrent network
to read a book and answer questions about it. In this paper, we present an
approach of reading text while skipping irrelevant information if needed. The
underlying model is a recurrent network that learns how far to jump after
reading a few words of the input text. We employ a standard policy gradient
method to train the model to make discrete jumping decisions. In our benchmarks
on four different tasks, including number prediction, sentiment analysis, news
article classification and automatic Q\&A, our proposed model, a modified LSTM
with jumping, is up to 6 times faster than the standard sequential LSTM, while
maintaining the same or even better accuracy
Adversarial Multi-task Learning for Text Classification
Neural network models have shown their promising opportunities for multi-task
learning, which focus on learning the shared layers to extract the common and
task-invariant features. However, in most existing approaches, the extracted
shared features are prone to be contaminated by task-specific features or the
noise brought by other tasks. In this paper, we propose an adversarial
multi-task learning framework, alleviating the shared and private latent
feature spaces from interfering with each other. We conduct extensive
experiments on 16 different text classification tasks, which demonstrates the
benefits of our approach. Besides, we show that the shared knowledge learned by
our proposed model can be regarded as off-the-shelf knowledge and easily
transferred to new tasks. The datasets of all 16 tasks are publicly available
at \url{http://nlp.fudan.edu.cn/data/}Comment: Accepted by ACL201
- …