185 research outputs found
Shortcut-Stacked Sentence Encoders for Multi-Domain Inference
We present a simple sequential sentence encoder for multi-domain natural
language inference. Our encoder is based on stacked bidirectional LSTM-RNNs
with shortcut connections and fine-tuning of word embeddings. The overall
supervised model uses the above encoder to encode two input sentences into two
vectors, and then uses a classifier over the vector combination to label the
relationship between these two sentences as that of entailment, contradiction,
or neural. Our Shortcut-Stacked sentence encoders achieve strong improvements
over existing encoders on matched and mismatched multi-domain natural language
inference (top non-ensemble single-model result in the EMNLP RepEval 2017
Shared Task (Nangia et al., 2017)). Moreover, they achieve the new
state-of-the-art encoding result on the original SNLI dataset (Bowman et al.,
2015).Comment: EMNLP 2017 RepEval Multi-NLI Shared Task (6 pages
Cell-aware Stacked LSTMs for Modeling Sentences
We propose a method of stacking multiple long short-term memory (LSTM) layers
for modeling sentences. In contrast to the conventional stacked LSTMs where
only hidden states are fed as input to the next layer, the suggested
architecture accepts both hidden and memory cell states of the preceding layer
and fuses information from the left and the lower context using the soft gating
mechanism of LSTMs. Thus the architecture modulates the amount of information
to be delivered not only in horizontal recurrence but also in vertical
connections, from which useful features extracted from lower layers are
effectively conveyed to upper layers. We dub this architecture Cell-aware
Stacked LSTM (CAS-LSTM) and show from experiments that our models bring
significant performance gain over the standard LSTMs on benchmark datasets for
natural language inference, paraphrase detection, sentiment classification, and
machine translation. We also conduct extensive qualitative analysis to
understand the internal behavior of the suggested approach.Comment: ACML 201
Enhancing Sentence Embedding with Generalized Pooling
Pooling is an essential component of a wide variety of sentence
representation and embedding models. This paper explores generalized pooling
methods to enhance sentence embedding. We propose vector-based multi-head
attention that includes the widely used max pooling, mean pooling, and scalar
self-attention as special cases. The model benefits from properly designed
penalization terms to reduce redundancy in multi-head attention. We evaluate
the proposed model on three different tasks: natural language inference (NLI),
author profiling, and sentiment classification. The experiments show that the
proposed model achieves significant improvement over strong
sentence-encoding-based methods, resulting in state-of-the-art performances on
four datasets. The proposed approach can be easily implemented for more
problems than we discuss in this paper.Comment: Accepted by COLING 201
The RepEval 2017 Shared Task: Multi-Genre Natural Language Inference with Sentence Representations
This paper presents the results of the RepEval 2017 Shared Task, which
evaluated neural network sentence representation learning models on the
Multi-Genre Natural Language Inference corpus (MultiNLI) recently introduced by
Williams et al. (2017). All of the five participating teams beat the
bidirectional LSTM (BiLSTM) and continuous bag of words baselines reported in
Williams et al.. The best single model used stacked BiLSTMs with residual
connections to extract sentence features and reached 74.5% accuracy on the
genre-matched test set. Surprisingly, the results of the competition were
fairly consistent across the genre-matched and genre-mismatched test sets, and
across subsets of the test data representing a variety of linguistic phenomena,
suggesting that all of the submitted systems learned reasonably
domain-independent representations for sentence meaning.Comment: 10 pages, 1 figure, 6 tables, in Proceedings of The Second Workshop
on Evaluating Vector Space Representations for NLP (RepEval 2017
Combining Similarity Features and Deep Representation Learning for Stance Detection in the Context of Checking Fake News
Fake news are nowadays an issue of pressing concern, given their recent rise
as a potential threat to high-quality journalism and well-informed public
discourse. The Fake News Challenge (FNC-1) was organized in 2017 to encourage
the development of machine learning-based classification systems for stance
detection (i.e., for identifying whether a particular news article agrees,
disagrees, discusses, or is unrelated to a particular news headline), thus
helping in the detection and analysis of possible instances of fake news. This
article presents a new approach to tackle this stance detection problem, based
on the combination of string similarity features with a deep neural
architecture that leverages ideas previously advanced in the context of
learning efficient text representations, document classification, and natural
language inference. Specifically, we use bi-directional Recurrent Neural
Networks, together with max-pooling over the temporal/sequential dimension and
neural attention, for representing (i) the headline, (ii) the first two
sentences of the news article, and (iii) the entire news article. These
representations are then combined/compared, complemented with similarity
features inspired on other FNC-1 approaches, and passed to a final layer that
predicts the stance of the article towards the headline. We also explore the
use of external sources of information, specifically large datasets of sentence
pairs originally proposed for training and evaluating natural language
inference methods, in order to pre-train specific components of the neural
network architecture (e.g., the RNNs used for encoding sentences). The obtained
results attest to the effectiveness of the proposed ideas and show that our
model, particularly when considering pre-training and the combination of neural
representations together with similarity features, slightly outperforms the
previous state-of-the-art.Comment: Accepted for publication in the special issue of the ACM Journal of
Data and Information Quality (ACM JDIQ) on Combating Digital Misinformation
and Disinformatio
Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference
The RepEval 2017 Shared Task aims to evaluate natural language understanding
models for sentence representation, in which a sentence is represented as a
fixed-length vector with neural networks and the quality of the representation
is tested with a natural language inference task. This paper describes our
system (alpha) that is ranked among the top in the Shared Task, on both the
in-domain test set (obtaining a 74.9% accuracy) and on the cross-domain test
set (also attaining a 74.9% accuracy), demonstrating that the model generalizes
well to the cross-domain data. Our model is equipped with intra-sentence
gated-attention composition which helps achieve a better performance. In
addition to submitting our model to the Shared Task, we have also tested it on
the Stanford Natural Language Inference (SNLI) dataset. We obtain an accuracy
of 85.5%, which is the best reported result on SNLI when cross-sentence
attention is not allowed, the same condition enforced in RepEval 2017.Comment: RepEval 2017 workshop paper at EMNLP 2017, Copenhage
Distance-based Self-Attention Network for Natural Language Inference
Attention mechanism has been used as an ancillary means to help RNN or CNN.
However, the Transformer (Vaswani et al., 2017) recently recorded the
state-of-the-art performance in machine translation with a dramatic reduction
in training time by solely using attention. Motivated by the Transformer,
Directional Self Attention Network (Shen et al., 2017), a fully attention-based
sentence encoder, was proposed. It showed good performance with various data by
using forward and backward directional information in a sentence. But in their
study, not considered at all was the distance between words, an important
feature when learning the local dependency to help understand the context of
input text. We propose Distance-based Self-Attention Network, which considers
the word distance by using a simple distance mask in order to model the local
dependency without losing the ability of modeling global dependency which
attention has inherent. Our model shows good performance with NLI data, and it
records the new state-of-the-art result with SNLI data. Additionally, we show
that our model has a strength in long sentences or documents.Comment: 12 pages, 13 figure
Recurrently Controlled Recurrent Networks
Recurrent neural networks (RNNs) such as long short-term memory and gated
recurrent units are pivotal building blocks across a broad spectrum of sequence
modeling problems. This paper proposes a recurrently controlled recurrent
network (RCRN) for expressive and powerful sequence encoding. More concretely,
the key idea behind our approach is to learn the recurrent gating functions
using recurrent networks. Our architecture is split into two components - a
controller cell and a listener cell whereby the recurrent controller actively
influences the compositionality of the listener cell. We conduct extensive
experiments on a myriad of tasks in the NLP domain such as sentiment analysis
(SST, IMDb, Amazon reviews, etc.), question classification (TREC), entailment
classification (SNLI, SciTail), answer selection (WikiQA, TrecQA) and reading
comprehension (NarrativeQA). Across all 26 datasets, our results demonstrate
that RCRN not only consistently outperforms BiLSTMs but also stacked BiLSTMs,
suggesting that our controller architecture might be a suitable replacement for
the widely adopted stacked architecture.Comment: NIPS 201
Neural Network Models for Paraphrase Identification, Semantic Textual Similarity, Natural Language Inference, and Question Answering
In this paper, we analyze several neural network designs (and their
variations) for sentence pair modeling and compare their performance
extensively across eight datasets, including paraphrase identification,
semantic textual similarity, natural language inference, and question answering
tasks. Although most of these models have claimed state-of-the-art performance,
the original papers often reported on only one or two selected datasets. We
provide a systematic study and show that (i) encoding contextual information by
LSTM and inter-sentence interactions are critical, (ii) Tree-LSTM does not help
as much as previously claimed but surprisingly improves performance on Twitter
datasets, (iii) the Enhanced Sequential Inference Model is the best so far for
larger datasets, while the Pairwise Word Interaction Model achieves the best
performance when less data is available. We release our implementations as an
open-source toolkit.Comment: 13 pages; accepted to COLING 201
Natural Language Inference over Interaction Space
Natural Language Inference (NLI) task requires an agent to determine the
logical relationship between a natural language premise and a natural language
hypothesis. We introduce Interactive Inference Network (IIN), a novel class of
neural network architectures that is able to achieve high-level understanding
of the sentence pair by hierarchically extracting semantic features from
interaction space. We show that an interaction tensor (attention weight)
contains semantic information to solve natural language inference, and a denser
interaction tensor contains richer semantic information. One instance of such
architecture, Densely Interactive Inference Network (DIIN), demonstrates the
state-of-the-art performance on large scale NLI copora and large-scale NLI
alike corpus. It's noteworthy that DIIN achieve a greater than 20% error
reduction on the challenging Multi-Genre NLI (MultiNLI) dataset with respect to
the strongest published system.Comment: 15 pages, 2 figures, under review as ICLR proceeding, Published at
Sixth International Conference on Learning Representations, ICLR 201
- …