13,459 research outputs found
Neural Network Models for Implicit Discourse Relation Classification in English and Chinese without Surface Features
Inferring implicit discourse relations in natural language text is the most
difficult subtask in discourse parsing. Surface features achieve good
performance, but they are not readily applicable to other languages without
semantic lexicons. Previous neural models require parses, surface features, or
a small label set to work well. Here, we propose neural network models that are
based on feedforward and long-short term memory architecture without any
surface features. To our surprise, our best configured feedforward architecture
outperforms LSTM-based model in most cases despite thorough tuning. Under
various fine-grained label sets and a cross-linguistic setting, our feedforward
models perform consistently better or at least just as well as systems that
require hand-crafted surface features. Our models present the first neural
Chinese discourse parser in the style of Chinese Discourse Treebank, showing
that our results hold cross-linguistically
Learning Universal Sentence Representations with Mean-Max Attention Autoencoder
In order to learn universal sentence representations, previous methods focus
on complex recurrent neural networks or supervised learning. In this paper, we
propose a mean-max attention autoencoder (mean-max AAE) within the
encoder-decoder framework. Our autoencoder rely entirely on the MultiHead
self-attention mechanism to reconstruct the input sequence. In the encoding we
propose a mean-max strategy that applies both mean and max pooling operations
over the hidden vectors to capture diverse information of the input. To enable
the information to steer the reconstruction process dynamically, the decoder
performs attention over the mean-max representation. By training our model on a
large collection of unlabelled data, we obtain high-quality representations of
sentences. Experimental results on a broad range of 10 transfer tasks
demonstrate that our model outperforms the state-of-the-art unsupervised single
methods, including the classical skip-thoughts and the advanced
skip-thoughts+LN model. Furthermore, compared with the traditional recurrent
neural network, our mean-max AAE greatly reduce the training time.Comment: EMNLP 201
Why and when should you pool? Analyzing Pooling in Recurrent Architectures
Pooling-based recurrent neural architectures consistently outperform their
counterparts without pooling. However, the reasons for their enhanced
performance are largely unexamined. In this work, we examine three commonly
used pooling techniques (mean-pooling, max-pooling, and attention), and propose
max-attention, a novel variant that effectively captures interactions among
predictive tokens in a sentence. We find that pooling-based architectures
substantially differ from their non-pooling equivalents in their learning
ability and positional biases--which elucidate their performance benefits. By
analyzing the gradient propagation, we discover that pooling facilitates better
gradient flow compared to BiLSTMs. Further, we expose how BiLSTMs are
positionally biased towards tokens in the beginning and the end of a sequence.
Pooling alleviates such biases. Consequently, we identify settings where
pooling offers large benefits: (i) in low resource scenarios, and (ii) when
important words lie towards the middle of the sentence. Among the pooling
techniques studied, max-attention is the most effective, resulting in
significant performance gains on several text classification tasks.Comment: Preprin
Deep Learning applied to NLP
Convolutional Neural Network (CNNs) are typically associated with Computer
Vision. CNNs are responsible for major breakthroughs in Image Classification
and are the core of most Computer Vision systems today. More recently CNNs have
been applied to problems in Natural Language Processing and gotten some
interesting results. In this paper, we will try to explain the basics of CNNs,
its different variations and how they have been applied to NLP
Semi-supervised Question Retrieval with Gated Convolutions
Question answering forums are rapidly growing in size with no effective
automated ability to refer to and reuse answers already available for previous
posted questions. In this paper, we develop a methodology for finding
semantically related questions. The task is difficult since 1) key pieces of
information are often buried in extraneous details in the question body and 2)
available annotations on similar questions are scarce and fragmented. We design
a recurrent and convolutional model (gated convolution) to effectively map
questions to their semantic representations. The models are pre-trained within
an encoder-decoder framework (from body to title) on the basis of the entire
raw corpus, and fine-tuned discriminatively from limited annotations. Our
evaluation demonstrates that our model yields substantial gains over a standard
IR baseline and various neural network architectures (including CNNs, LSTMs and
GRUs).Comment: NAACL 201
Multi-Cast Attention Networks for Retrieval-based Question Answering and Response Prediction
Attention is typically used to select informative sub-phrases that are used
for prediction. This paper investigates the novel use of attention as a form of
feature augmentation, i.e, casted attention. We propose Multi-Cast Attention
Networks (MCAN), a new attention mechanism and general model architecture for a
potpourri of ranking tasks in the conversational modeling and question
answering domains. Our approach performs a series of soft attention operations,
each time casting a scalar feature upon the inner word embeddings. The key idea
is to provide a real-valued hint (feature) to a subsequent encoder layer and is
targeted at improving the representation learning process. There are several
advantages to this design, e.g., it allows an arbitrary number of attention
mechanisms to be casted, allowing for multiple attention types (e.g.,
co-attention, intra-attention) and attention variants (e.g., alignment-pooling,
max-pooling, mean-pooling) to be executed simultaneously. This not only
eliminates the costly need to tune the nature of the co-attention layer, but
also provides greater extents of explainability to practitioners. Via extensive
experiments on four well-known benchmark datasets, we show that MCAN achieves
state-of-the-art performance. On the Ubuntu Dialogue Corpus, MCAN outperforms
existing state-of-the-art models by . MCAN also achieves the best
performing score to date on the well-studied TrecQA dataset.Comment: Accepted to KDD 2018 (Paper titled only "Multi-Cast Attention
Networks" in KDD version
RAP-Net: Recurrent Attention Pooling Networks for Dialogue Response Selection
The response selection has been an emerging research topic due to the growing
interest in dialogue modeling, where the goal of the task is to select an
appropriate response for continuing dialogues. To further push the end-to-end
dialogue model toward real-world scenarios, the seventh Dialog System
Technology Challenge (DSTC7) proposed a challenging track based on real chatlog
datasets. The competition focuses on dialogue modeling with several advanced
characteristics: (1) natural language diversity, (2) capability of precisely
selecting a proper response from a large set of candidates or the scenario
without any correct answer, and (3) knowledge grounding. This paper introduces
recurrent attention pooling networks (RAP-Net), a novel framework for response
selection, which can well estimate the relevance between the dialogue contexts
and the candidates. The proposed RAP-Net is shown to be effective and can be
generalized across different datasets and settings in the DSTC7 experiments
Question-Aware Sentence Gating Networks for Question and Answering
Machine comprehension question answering, which finds an answer to the
question given a passage, involves high-level reasoning processes of
understanding and tracking the relevant contents across various semantic units
such as words, phrases, and sentences in a document. This paper proposes the
novel question-aware sentence gating networks that directly incorporate the
sentence-level information into word-level encoding processes. To this end, our
model first learns question-aware sentence representations and then dynamically
combines them with word-level representations, resulting in semantically
meaningful word representations for QA tasks. Experimental results demonstrate
that our approach consistently improves the accuracy over existing baseline
approaches on various QA datasets and bears the wide applicability to other
neural network-based QA models
Reasoning with Sarcasm by Reading In-between
Sarcasm is a sophisticated speech act which commonly manifests on social
communities such as Twitter and Reddit. The prevalence of sarcasm on the social
web is highly disruptive to opinion mining systems due to not only its tendency
of polarity flipping but also usage of figurative language. Sarcasm commonly
manifests with a contrastive theme either between positive-negative sentiments
or between literal-figurative scenarios. In this paper, we revisit the notion
of modeling contrast in order to reason with sarcasm. More specifically, we
propose an attention-based neural model that looks in-between instead of
across, enabling it to explicitly model contrast and incongruity. We conduct
extensive experiments on six benchmark datasets from Twitter, Reddit and the
Internet Argument Corpus. Our proposed model not only achieves state-of-the-art
performance on all datasets but also enjoys improved interpretability.Comment: Accepted to ACL201
Extraction of Salient Sentences from Labelled Documents
We present a hierarchical convolutional document model with an architecture
designed to support introspection of the document structure. Using this model,
we show how to use visualisation techniques from the computer vision literature
to identify and extract topic-relevant sentences.
We also introduce a new scalable evaluation technique for automatic sentence
extraction systems that avoids the need for time consuming human annotation of
validation data.Comment: arXiv admin note: substantial text overlap with arXiv:1406.383
- …