7,272 research outputs found
Deep Residual Output Layers for Neural Language Generation
Many tasks, including language generation, benefit from learning the
structure of the output space, particularly when the space of output labels is
large and the data is sparse. State-of-the-art neural language models
indirectly capture the output space structure in their classifier weights since
they lack parameter sharing across output labels. Learning shared output label
mappings helps, but existing methods have limited expressivity and are prone to
overfitting. In this paper, we investigate the usefulness of more powerful
shared mappings for output labels, and propose a deep residual output mapping
with dropout between layers to better capture the structure of the output space
and avoid overfitting. Evaluations on three language generation tasks show that
our output label mapping can match or improve state-of-the-art recurrent and
self-attention architectures, and suggest that the classifier does not
necessarily need to be high-rank to better model natural language if it is
better at capturing the structure of the output space.Comment: To appear in ICML 201
Long-Term Memory Networks for Question Answering
Question answering is an important and difficult task in the natural language
processing domain, because many basic natural language processing tasks can be
cast into a question answering task. Several deep neural network architectures
have been developed recently, which employ memory and inference components to
memorize and reason over text information, and generate answers to questions.
However, a major drawback of many such models is that they are capable of only
generating single-word answers. In addition, they require large amount of
training data to generate accurate answers. In this paper, we introduce the
Long-Term Memory Network (LTMN), which incorporates both an external memory
module and a Long Short-Term Memory (LSTM) module to comprehend the input data
and generate multi-word answers. The LTMN model can be trained end-to-end using
back-propagation and requires minimal supervision. We test our model on two
synthetic data sets (based on Facebook's bAbI data set) and the real-world
Stanford question answering data set, and show that it can achieve
state-of-the-art performance
Learning to Compose Task-Specific Tree Structures
For years, recursive neural networks (RvNNs) have been shown to be suitable
for representing text into fixed-length vectors and achieved good performance
on several natural language processing tasks. However, the main drawback of
RvNNs is that they require structured input, which makes data preparation and
model implementation hard. In this paper, we propose Gumbel Tree-LSTM, a novel
tree-structured long short-term memory architecture that learns how to compose
task-specific tree structures only from plain text data efficiently. Our model
uses Straight-Through Gumbel-Softmax estimator to decide the parent node among
candidates dynamically and to calculate gradients of the discrete decision. We
evaluate the proposed model on natural language inference and sentiment
analysis, and show that our model outperforms or is at least comparable to
previous models. We also find that our model converges significantly faster
than other models.Comment: AAAI 201
FRAGE: Frequency-Agnostic Word Representation
Continuous word representation (aka word embedding) is a basic building block
in many neural network-based models used in natural language processing tasks.
Although it is widely accepted that words with similar semantics should be
close to each other in the embedding space, we find that word embeddings
learned in several tasks are biased towards word frequency: the embeddings of
high-frequency and low-frequency words lie in different subregions of the
embedding space, and the embedding of a rare word and a popular word can be far
from each other even if they are semantically similar. This makes learned word
embeddings ineffective, especially for rare words, and consequently limits the
performance of these neural network models. In this paper, we develop a neat,
simple yet effective way to learn \emph{FRequency-AGnostic word Embedding}
(FRAGE) using adversarial training. We conducted comprehensive studies on ten
datasets across four natural language processing tasks, including word
similarity, language modeling, machine translation and text classification.
Results show that with FRAGE, we achieve higher performance than the baselines
in all tasks.Comment: To appear in NIPS 201
Effective LSTMs for Target-Dependent Sentiment Classification
Target-dependent sentiment classification remains a challenge: modeling the
semantic relatedness of a target with its context words in a sentence.
Different context words have different influences on determining the sentiment
polarity of a sentence towards the target. Therefore, it is desirable to
integrate the connections between target word and context words when building a
learning system. In this paper, we develop two target dependent long short-term
memory (LSTM) models, where target information is automatically taken into
account. We evaluate our methods on a benchmark dataset from Twitter. Empirical
results show that modeling sentence representation with standard LSTM does not
perform well. Incorporating target information into LSTM can significantly
boost the classification accuracy. The target-dependent LSTM models achieve
state-of-the-art performances without using syntactic parser or external
sentiment lexicons.Comment: 7 pages, 3 figures published in COLING 201
Application of a Hybrid Bi-LSTM-CRF model to the task of Russian Named Entity Recognition
Named Entity Recognition (NER) is one of the most common tasks of the natural
language processing. The purpose of NER is to find and classify tokens in text
documents into predefined categories called tags, such as person names,
quantity expressions, percentage expressions, names of locations,
organizations, as well as expression of time, currency and others. Although
there is a number of approaches have been proposed for this task in Russian
language, it still has a substantial potential for the better solutions. In
this work, we studied several deep neural network models starting from vanilla
Bi-directional Long Short-Term Memory (Bi-LSTM) then supplementing it with
Conditional Random Fields (CRF) as well as highway networks and finally adding
external word embeddings. All models were evaluated across three datasets:
Gareev's dataset, Person-1000, FactRuEval-2016. We found that extension of
Bi-LSTM model with CRF significantly increased the quality of predictions.
Encoding input tokens with external word embeddings reduced training time and
allowed to achieve state of the art for the Russian NER task.Comment: Artificial Intelligence and Natural Language Conference (AINL 2017
An Attention-Gated Convolutional Neural Network for Sentence Classification
The classification of sentences is very challenging, since sentences contain
the limited contextual information. In this paper, we proposed an
Attention-Gated Convolutional Neural Network (AGCNN) for sentence
classification, which generates attention weights from the feature's context
windows of different sizes by using specialized convolution encoders. It makes
full use of limited contextual information to extract and enhance the influence
of important features in predicting the sentence's category. Experimental
results demonstrated that our model can achieve up to 3.1% higher accuracy than
standard CNN models, and gain competitive results over the baselines on four
out of the six tasks. Besides, we designed an activation function, namely,
Natural Logarithm rescaled Rectified Linear Unit (NLReLU). Experiments showed
that NLReLU can outperform ReLU and is comparable to other well-known
activation functions on AGCNN.Comment: Accepted for publication in the Intelligent Data Analysis journal, 19
pages, 4 figures and 5 table
Semi-supervised emotion lexicon expansion with label propagation and specialized word embeddings
There exist two main approaches to automatically extract affective
orientation: lexicon-based and corpus-based. In this work, we argue that these
two methods are compatible and show that combining them can improve the
accuracy of emotion classifiers. In particular, we introduce a novel variant of
the Label Propagation algorithm that is tailored to distributed word
representations, we apply batch gradient descent to accelerate the optimization
of label propagation and to make the optimization feasible for large graphs,
and we propose a reproducible method for emotion lexicon expansion. We conclude
that label propagation can expand an emotion lexicon in a meaningful way and
that the expanded emotion lexicon can be leveraged to improve the accuracy of
an emotion classifier
An Empirical Study of Discriminative Sequence Labeling Models for Vietnamese Text Processing
This paper presents an empirical study of two widely-used sequence prediction
models, Conditional Random Fields (CRFs) and Long Short-Term Memory Networks
(LSTMs), on two fundamental tasks for Vietnamese text processing, including
part-of-speech tagging and named entity recognition. We show that a strong
lower bound for labeling accuracy can be obtained by relying only on simple
word-based features with minimal hand-crafted feature engineering, of 90.65\%
and 86.03\% performance scores on the standard test sets for the two tasks
respectively. In particular, we demonstrate empirically the surprising
efficiency of word embeddings in both of the two tasks, with both of the two
models. We point out that the state-of-the-art LSTMs model does not always
outperform significantly the traditional CRFs model, especially on
moderate-sized data sets. Finally, we give some suggestions and discussions for
efficient use of sequence labeling models in practical applications.Comment: To appear in the Proceedings of the 9th International Conference on
Knowledge and Systems Engineering (KSE) 201
Machine Translation Evaluation with Neural Networks
We present a framework for machine translation evaluation using neural
networks in a pairwise setting, where the goal is to select the better
translation from a pair of hypotheses, given the reference translation. In this
framework, lexical, syntactic and semantic information from the reference and
the two hypotheses is embedded into compact distributed vector representations,
and fed into a multi-layer neural network that models nonlinear interactions
between each of the hypotheses and the reference, as well as between the two
hypotheses. We experiment with the benchmark datasets from the WMT Metrics
shared task, on which we obtain the best results published so far, with the
basic network configuration. We also perform a series of experiments to analyze
and understand the contribution of the different components of the network. We
evaluate variants and extensions, including fine-tuning of the semantic
embeddings, and sentence-based representations modeled with convolutional and
recurrent neural networks. In summary, the proposed framework is flexible and
generalizable, allows for efficient learning and scoring, and provides an MT
evaluation metric that correlates with human judgments, and is on par with the
state of the art.Comment: Machine Translation, Reference-based MT Evaluation, Deep Neural
Networks, Distributed Representation of Texts, Textual Similarit
- …