Search CORE

124,670 research outputs found

Trimming and Improving Skip-thought Vectors

Author: de Sa Virginia R.
Fang Chen
Jin Hailin
Tang Shuai
Wang Zhaowen
Publication venue
Publication date: 09/06/2017
Field of study

The skip-thought model has been proven to be effective at learning sentence representations and capturing sentence semantics. In this paper, we propose a suite of techniques to trim and improve it. First, we validate a hypothesis that, given a current sentence, inferring the previous and inferring the next sentence provide similar supervision power, therefore only one decoder for predicting the next sentence is preserved in our trimmed skip-thought model. Second, we present a connection layer between encoder and decoder to help the model to generalize better on semantic relatedness tasks. Third, we found that a good word embedding initialization is also essential for learning better sentence representations. We train our model unsupervised on a large corpus with contiguous sentences, and then evaluate the trained model on 7 supervised tasks, which includes semantic relatedness, paraphrase detection, and text classification benchmarks. We empirically show that, our proposed model is a faster, lighter-weight and equally powerful alternative to the original skip-thought model

arXiv.org e-Print Archive

Is Attention Interpretable?

Author: Serrano Sofia
Smith Noah A.
Publication venue
Publication date: 09/06/2019
Field of study

Attention mechanisms have recently boosted performance on a range of NLP tasks. Because attention layers explicitly weight input components' representations, it is also often assumed that attention can be used to identify information that models found important (e.g., specific contextualized word tokens). We test whether that assumption holds by manipulating attention weights in already-trained text classification models and analyzing the resulting differences in their predictions. While we observe some ways in which higher attention weights correlate with greater impact on model predictions, we also find many ways in which this does not hold, i.e., where gradient-based rankings of attention weights better predict their effects than their magnitudes. We conclude that while attention noisily predicts input components' overall importance to a model, it is by no means a fail-safe indicator.Comment: To appear at ACL 201

arXiv.org e-Print Archive

Learning Multimodal Word Representation via Dynamic Fusion Methods

Author: Wang Shaonan
Zhang Jiajun
Zong Chengqing
Publication venue
Publication date: 01/01/2018
Field of study

Multimodal models have been proven to outperform text-based models on learning semantic word representations. Almost all previous multimodal models typically treat the representations from different modalities equally. However, it is obvious that information from different modalities contributes differently to the meaning of words. This motivates us to build a multimodal model that can dynamically fuse the semantic representations from different modalities according to different types of words. To that end, we propose three novel dynamic fusion methods to assign importance weights to each modality, in which weights are learned under the weak supervision of word association pairs. The extensive experiments have demonstrated that the proposed methods outperform strong unimodal baselines and state-of-the-art multimodal models.Comment: To be appear in AAAI-1

arXiv.org e-Print Archive

Left-Center-Right Separated Neural Network for Aspect-based Sentiment Analysis with Rotatory Attention

Author: Xia Rui
Zheng Shiliang
Publication venue
Publication date: 02/02/2018
Field of study

Deep learning techniques have achieved success in aspect-based sentiment analysis in recent years. However, there are two important issues that still remain to be further studied, i.e., 1) how to efficiently represent the target especially when the target contains multiple words; 2) how to utilize the interaction between target and left/right contexts to capture the most important words in them. In this paper, we propose an approach, called left-center-right separated neural network with rotatory attention (LCR-Rot), to better address the two problems. Our approach has two characteristics: 1) it has three separated LSTMs, i.e., left, center and right LSTMs, corresponding to three parts of a review (left context, target phrase and right context); 2) it has a rotatory attention mechanism which models the relation between target and left/right contexts. The target2context attention is used to capture the most indicative sentiment words in left/right contexts. Subsequently, the context2target attention is used to capture the most important word in the target. This leads to a two-side representation of the target: left-aware target and right-aware target. We compare our approach on three benchmark datasets with ten related methods proposed recently. The results show that our approach significantly outperforms the state-of-the-art techniques

arXiv.org e-Print Archive

Feature Weight Tuning for Recursive Neural Networks

Author: Li Jiwei
Publication venue
Publication date: 12/12/2014
Field of study

This paper addresses how a recursive neural network model can automatically leave out useless information and emphasize important evidence, in other words, to perform "weight tuning" for higher-level representation acquisition. We propose two models, Weighted Neural Network (WNN) and Binary-Expectation Neural Network (BENN), which automatically control how much one specific unit contributes to the higher-level representation. The proposed model can be viewed as incorporating a more powerful compositional function for embedding acquisition in recursive neural networks. Experimental results demonstrate the significant improvement over standard neural models

arXiv.org e-Print Archive

Semantic Regularities in Document Representations

Author: Cheng Xueqi
Guo Jiafeng
Lan Yanyan
Sun Fei
Xu Jun
Publication venue
Publication date: 24/03/2016
Field of study

Recent work exhibited that distributed word representations are good at capturing linguistic regularities in language. This allows vector-oriented reasoning based on simple linear algebra between words. Since many different methods have been proposed for learning document representations, it is natural to ask whether there is also linear structure in these learned representations to allow similar reasoning at document level. To answer this question, we design a new document analogy task for testing the semantic regularities in document representations, and conduct empirical evaluations over several state-of-the-art document representation models. The results reveal that neural embedding based document representations work better on this analogy task than conventional methods, and we provide some preliminary explanations over these observations.Comment: 6 page

arXiv.org e-Print Archive

Learning to Refine Source Representations for Neural Machine Translation

Author: Geng Xinwei
Liu Ting
Qin Bing
Tu Zhaopeng
Wang Longyue
Wang Xing
Publication venue
Publication date: 26/12/2018
Field of study

Neural machine translation (NMT) models generally adopt an encoder-decoder architecture for modeling the entire translation process. The encoder summarizes the representation of input sentence from scratch, which is potentially a problem if the sentence is ambiguous. When translating a text, humans often create an initial understanding of the source sentence and then incrementally refine it along the translation on the target side. Starting from this intuition, we propose a novel encoder-refiner-decoder framework, which dynamically refines the source representations based on the generated target-side information at each decoding step. Since the refining operations are time-consuming, we propose a strategy, leveraging the power of reinforcement learning models, to decide when to refine at specific decoding steps. Experimental results on both Chinese-English and English-German translation tasks show that the proposed approach significantly and consistently improves translation performance over the standard encoder-decoder framework. Furthermore, when refining strategy is applied, results still show reasonable improvement over the baseline without much decrease in decoding speed

arXiv.org e-Print Archive

Semantic Word Clusters Using Signed Normalized Graph Cuts

Author: Foster Dean
Gallier Jean
Sedoc João
Ungar Lyle
Publication venue
Publication date: 20/01/2016
Field of study

Vector space representations of words capture many aspects of word similarity, but such methods tend to make vector spaces in which antonyms (as well as synonyms) are close to each other. We present a new signed spectral normalized graph cut algorithm, signed clustering, that overlays existing thesauri upon distributionally derived vector representations of words, so that antonym relationships between word pairs are represented by negative weights. Our signed clustering algorithm produces clusters of words which simultaneously capture distributional and synonym relations. We evaluate these clusters against the SimLex-999 dataset (Hill et al.,2014) of human judgments of word pair similarities, and also show the benefit of using our clusters to predict the sentiment of a given text

arXiv.org e-Print Archive

Deep Residual Output Layers for Neural Language Generation

Author: Henderson James
Pappas Nikolaos
Publication venue
Publication date: 22/05/2019
Field of study

Many tasks, including language generation, benefit from learning the structure of the output space, particularly when the space of output labels is large and the data is sparse. State-of-the-art neural language models indirectly capture the output space structure in their classifier weights since they lack parameter sharing across output labels. Learning shared output label mappings helps, but existing methods have limited expressivity and are prone to overfitting. In this paper, we investigate the usefulness of more powerful shared mappings for output labels, and propose a deep residual output mapping with dropout between layers to better capture the structure of the output space and avoid overfitting. Evaluations on three language generation tasks show that our output label mapping can match or improve state-of-the-art recurrent and self-attention architectures, and suggest that the classifier does not necessarily need to be high-rank to better model natural language if it is better at capturing the structure of the output space.Comment: To appear in ICML 201

arXiv.org e-Print Archive

NRPA: Neural Recommendation with Personalized Attention

Author: Jiao Pengfei
Liu Hongtao
Wang Wenjun
Wang Xianchen
Wu Chuhan
Wu Fangzhao
Xie Xing
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 29/05/2019
Field of study

Existing review-based recommendation methods usually use the same model to learn the representations of all users/items from reviews posted by users towards items. However, different users have different preference and different items have different characteristics. Thus, the same word or similar reviews may have different informativeness for different users and items. In this paper we propose a neural recommendation approach with personalized attention to learn personalized representations of users and items from reviews. We use a review encoder to learn representations of reviews from words, and a user/item encoder to learn representations of users or items from reviews. We propose a personalized attention model, and apply it to both review and user/item encoders to select different important words and reviews for different users/items. Experiments on five datasets validate our approach can effectively improve the performance of neural recommendation.Comment: 4 pages, 4 figure

arXiv.org e-Print Archive