124,670 research outputs found
Trimming and Improving Skip-thought Vectors
The skip-thought model has been proven to be effective at learning sentence
representations and capturing sentence semantics. In this paper, we propose a
suite of techniques to trim and improve it. First, we validate a hypothesis
that, given a current sentence, inferring the previous and inferring the next
sentence provide similar supervision power, therefore only one decoder for
predicting the next sentence is preserved in our trimmed skip-thought model.
Second, we present a connection layer between encoder and decoder to help the
model to generalize better on semantic relatedness tasks. Third, we found that
a good word embedding initialization is also essential for learning better
sentence representations. We train our model unsupervised on a large corpus
with contiguous sentences, and then evaluate the trained model on 7 supervised
tasks, which includes semantic relatedness, paraphrase detection, and text
classification benchmarks. We empirically show that, our proposed model is a
faster, lighter-weight and equally powerful alternative to the original
skip-thought model
Is Attention Interpretable?
Attention mechanisms have recently boosted performance on a range of NLP
tasks. Because attention layers explicitly weight input components'
representations, it is also often assumed that attention can be used to
identify information that models found important (e.g., specific contextualized
word tokens). We test whether that assumption holds by manipulating attention
weights in already-trained text classification models and analyzing the
resulting differences in their predictions. While we observe some ways in which
higher attention weights correlate with greater impact on model predictions, we
also find many ways in which this does not hold, i.e., where gradient-based
rankings of attention weights better predict their effects than their
magnitudes. We conclude that while attention noisily predicts input components'
overall importance to a model, it is by no means a fail-safe indicator.Comment: To appear at ACL 201
Learning Multimodal Word Representation via Dynamic Fusion Methods
Multimodal models have been proven to outperform text-based models on
learning semantic word representations. Almost all previous multimodal models
typically treat the representations from different modalities equally. However,
it is obvious that information from different modalities contributes
differently to the meaning of words. This motivates us to build a multimodal
model that can dynamically fuse the semantic representations from different
modalities according to different types of words. To that end, we propose three
novel dynamic fusion methods to assign importance weights to each modality, in
which weights are learned under the weak supervision of word association pairs.
The extensive experiments have demonstrated that the proposed methods
outperform strong unimodal baselines and state-of-the-art multimodal models.Comment: To be appear in AAAI-1
Left-Center-Right Separated Neural Network for Aspect-based Sentiment Analysis with Rotatory Attention
Deep learning techniques have achieved success in aspect-based sentiment
analysis in recent years. However, there are two important issues that still
remain to be further studied, i.e., 1) how to efficiently represent the target
especially when the target contains multiple words; 2) how to utilize the
interaction between target and left/right contexts to capture the most
important words in them. In this paper, we propose an approach, called
left-center-right separated neural network with rotatory attention (LCR-Rot),
to better address the two problems. Our approach has two characteristics: 1) it
has three separated LSTMs, i.e., left, center and right LSTMs, corresponding to
three parts of a review (left context, target phrase and right context); 2) it
has a rotatory attention mechanism which models the relation between target and
left/right contexts. The target2context attention is used to capture the most
indicative sentiment words in left/right contexts. Subsequently, the
context2target attention is used to capture the most important word in the
target. This leads to a two-side representation of the target: left-aware
target and right-aware target. We compare our approach on three benchmark
datasets with ten related methods proposed recently. The results show that our
approach significantly outperforms the state-of-the-art techniques
Feature Weight Tuning for Recursive Neural Networks
This paper addresses how a recursive neural network model can automatically
leave out useless information and emphasize important evidence, in other words,
to perform "weight tuning" for higher-level representation acquisition. We
propose two models, Weighted Neural Network (WNN) and Binary-Expectation Neural
Network (BENN), which automatically control how much one specific unit
contributes to the higher-level representation. The proposed model can be
viewed as incorporating a more powerful compositional function for embedding
acquisition in recursive neural networks. Experimental results demonstrate the
significant improvement over standard neural models
Semantic Regularities in Document Representations
Recent work exhibited that distributed word representations are good at
capturing linguistic regularities in language. This allows vector-oriented
reasoning based on simple linear algebra between words. Since many different
methods have been proposed for learning document representations, it is natural
to ask whether there is also linear structure in these learned representations
to allow similar reasoning at document level. To answer this question, we
design a new document analogy task for testing the semantic regularities in
document representations, and conduct empirical evaluations over several
state-of-the-art document representation models. The results reveal that neural
embedding based document representations work better on this analogy task than
conventional methods, and we provide some preliminary explanations over these
observations.Comment: 6 page
Learning to Refine Source Representations for Neural Machine Translation
Neural machine translation (NMT) models generally adopt an encoder-decoder
architecture for modeling the entire translation process. The encoder
summarizes the representation of input sentence from scratch, which is
potentially a problem if the sentence is ambiguous. When translating a text,
humans often create an initial understanding of the source sentence and then
incrementally refine it along the translation on the target side. Starting from
this intuition, we propose a novel encoder-refiner-decoder framework, which
dynamically refines the source representations based on the generated
target-side information at each decoding step. Since the refining operations
are time-consuming, we propose a strategy, leveraging the power of
reinforcement learning models, to decide when to refine at specific decoding
steps. Experimental results on both Chinese-English and English-German
translation tasks show that the proposed approach significantly and
consistently improves translation performance over the standard encoder-decoder
framework. Furthermore, when refining strategy is applied, results still show
reasonable improvement over the baseline without much decrease in decoding
speed
Semantic Word Clusters Using Signed Normalized Graph Cuts
Vector space representations of words capture many aspects of word
similarity, but such methods tend to make vector spaces in which antonyms (as
well as synonyms) are close to each other. We present a new signed spectral
normalized graph cut algorithm, signed clustering, that overlays existing
thesauri upon distributionally derived vector representations of words, so that
antonym relationships between word pairs are represented by negative weights.
Our signed clustering algorithm produces clusters of words which simultaneously
capture distributional and synonym relations. We evaluate these clusters
against the SimLex-999 dataset (Hill et al.,2014) of human judgments of word
pair similarities, and also show the benefit of using our clusters to predict
the sentiment of a given text
Deep Residual Output Layers for Neural Language Generation
Many tasks, including language generation, benefit from learning the
structure of the output space, particularly when the space of output labels is
large and the data is sparse. State-of-the-art neural language models
indirectly capture the output space structure in their classifier weights since
they lack parameter sharing across output labels. Learning shared output label
mappings helps, but existing methods have limited expressivity and are prone to
overfitting. In this paper, we investigate the usefulness of more powerful
shared mappings for output labels, and propose a deep residual output mapping
with dropout between layers to better capture the structure of the output space
and avoid overfitting. Evaluations on three language generation tasks show that
our output label mapping can match or improve state-of-the-art recurrent and
self-attention architectures, and suggest that the classifier does not
necessarily need to be high-rank to better model natural language if it is
better at capturing the structure of the output space.Comment: To appear in ICML 201
NRPA: Neural Recommendation with Personalized Attention
Existing review-based recommendation methods usually use the same model to
learn the representations of all users/items from reviews posted by users
towards items. However, different users have different preference and different
items have different characteristics. Thus, the same word or similar reviews
may have different informativeness for different users and items. In this paper
we propose a neural recommendation approach with personalized attention to
learn personalized representations of users and items from reviews. We use a
review encoder to learn representations of reviews from words, and a user/item
encoder to learn representations of users or items from reviews. We propose a
personalized attention model, and apply it to both review and user/item
encoders to select different important words and reviews for different
users/items. Experiments on five datasets validate our approach can effectively
improve the performance of neural recommendation.Comment: 4 pages, 4 figure
- …