1,116 research outputs found
Dual Supervised Learning
Many supervised learning tasks are emerged in dual forms, e.g.,
English-to-French translation vs. French-to-English translation, speech
recognition vs. text to speech, and image classification vs. image generation.
Two dual tasks have intrinsic connections with each other due to the
probabilistic correlation between their models. This connection is, however,
not effectively utilized today, since people usually train the models of two
dual tasks separately and independently. In this work, we propose training the
models of two dual tasks simultaneously, and explicitly exploiting the
probabilistic correlation between them to regularize the training process. For
ease of reference, we call the proposed approach \emph{dual supervised
learning}. We demonstrate that dual supervised learning can improve the
practical performances of both tasks, for various applications including
machine translation, image processing, and sentiment analysis.Comment: ICML 201
Harnessing Deep Neural Networks with Logic Rules
Combining deep neural networks with structured logic rules is desirable to
harness flexibility and reduce uninterpretability of the neural models. We
propose a general framework capable of enhancing various types of neural
networks (e.g., CNNs and RNNs) with declarative first-order logic rules.
Specifically, we develop an iterative distillation method that transfers the
structured information of logic rules into the weights of neural networks. We
deploy the framework on a CNN for sentiment analysis, and an RNN for named
entity recognition. With a few highly intuitive rules, we obtain substantial
improvements and achieve state-of-the-art or comparable results to previous
best-performing systems.Comment: Fix typos in appendix. ACL 201
Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations
We propose a novel data augmentation for labeled sentences called contextual
augmentation. We assume an invariance that sentences are natural even if the
words in the sentences are replaced with other words with paradigmatic
relations. We stochastically replace words with other words that are predicted
by a bi-directional language model at the word positions. Words predicted
according to a context are numerous but appropriate for the augmentation of the
original words. Furthermore, we retrofit a language model with a
label-conditional architecture, which allows the model to augment sentences
without breaking the label-compatibility. Through the experiments for six
various different text classification tasks, we demonstrate that the proposed
method improves classifiers based on the convolutional or recurrent neural
networks.Comment: NAACL 201
Recursive Neural Conditional Random Fields for Aspect-based Sentiment Analysis
In aspect-based sentiment analysis, extracting aspect terms along with the
opinions being expressed from user-generated content is one of the most
important subtasks. Previous studies have shown that exploiting connections
between aspect and opinion terms is promising for this task. In this paper, we
propose a novel joint model that integrates recursive neural networks and
conditional random fields into a unified framework for explicit aspect and
opinion terms co-extraction. The proposed model learns high-level
discriminative features and double propagate information between aspect and
opinion terms, simultaneously. Moreover, it is flexible to incorporate
hand-crafted features into the proposed model to further boost its information
extraction performance. Experimental results on the SemEval Challenge 2014
dataset show the superiority of our proposed model over several baseline
methods as well as the winning systems of the challenge
Words are not Equal: Graded Weighting Model for building Composite Document Vectors
Despite the success of distributional semantics, composing phrases from word
vectors remains an important challenge. Several methods have been tried for
benchmark tasks such as sentiment classification, including word vector
averaging, matrix-vector approaches based on parsing, and on-the-fly learning
of paragraph vectors. Most models usually omit stop words from the composition.
Instead of such an yes-no decision, we consider several graded schemes where
words are weighted according to their discriminatory relevance with respect to
its use in the document (e.g., idf). Some of these methods (particularly
tf-idf) are seen to result in a significant improvement in performance over
prior state of the art. Further, combining such approaches into an ensemble
based on alternate classifiers such as the RNN model, results in an 1.6%
performance improvement on the standard IMDB movie review dataset, and a 7.01%
improvement on Amazon product reviews. Since these are language free models and
can be obtained in an unsupervised manner, they are of interest also for
under-resourced languages such as Hindi as well and many more languages. We
demonstrate the language free aspects by showing a gain of 12% for two review
datasets over earlier results, and also release a new larger dataset for future
testing (Singh,2015).Comment: 10 Pages, 2 Figures, 11 Table
-Nearest Neighbor Augmented Neural Networks for Text Classification
In recent years, many deep-learning based models are proposed for text
classification. This kind of models well fits the training set from the
statistical point of view. However, it lacks the capacity of utilizing
instance-level information from individual instances in the training set. In
this work, we propose to enhance neural network models by allowing them to
leverage information from -nearest neighbor (kNN) of the input text. Our
model employs a neural network that encodes texts into text embeddings.
Moreover, we also utilize -nearest neighbor of the input text as an external
memory, and utilize it to capture instance-level information from the training
set. The final prediction is made based on features from both the neural
network encoder and the kNN memory. Experimental results on several standard
benchmark datasets show that our model outperforms the baseline model on all
the datasets, and it even beats a very deep neural network model (with 29
layers) in several datasets. Our model also shows superior performance when
training instances are scarce, and when the training set is severely
unbalanced. Our model also leverages techniques such as semi-supervised
training and transfer learning quite well
A Probabilistic Formulation of Unsupervised Text Style Transfer
We present a deep generative model for unsupervised text style transfer that
unifies previously proposed non-generative techniques. Our probabilistic
approach models non-parallel data from two domains as a partially observed
parallel corpus. By hypothesizing a parallel latent sequence that generates
each observed sequence, our model learns to transform sequences from one domain
to another in a completely unsupervised fashion. In contrast with traditional
generative sequence models (e.g. the HMM), our model makes few assumptions
about the data it generates: it uses a recurrent language model as a prior and
an encoder-decoder as a transduction distribution. While computation of
marginal data likelihood is intractable in this model class, we show that
amortized variational inference admits a practical surrogate. Further, by
drawing connections between our variational objective and other recent
unsupervised style transfer and machine translation techniques, we show how our
probabilistic view can unify some known non-generative objectives such as
backtranslation and adversarial loss. Finally, we demonstrate the effectiveness
of our method on a wide range of unsupervised style transfer tasks, including
sentiment transfer, formality transfer, word decipherment, author imitation,
and related language translation. Across all style transfer tasks, our approach
yields substantial gains over state-of-the-art non-generative baselines,
including the state-of-the-art unsupervised machine translation techniques that
our approach generalizes. Further, we conduct experiments on a standard
unsupervised machine translation task and find that our unified approach
matches the current state-of-the-art.Comment: ICLR 2020 conference paper (spotlight). The first two authors
contributed equall
Dual Memory Network Model for Biased Product Review Classification
In sentiment analysis (SA) of product reviews, both user and product
information are proven to be useful. Current tasks handle user profile and
product information in a unified model which may not be able to learn salient
features of users and products effectively. In this work, we propose a dual
user and product memory network (DUPMN) model to learn user profiles and
product reviews using separate memory networks. Then, the two representations
are used jointly for sentiment prediction. The use of separate models aims to
capture user profiles and product information more effectively. Compared to
state-of-the-art unified prediction models, the evaluations on three benchmark
datasets, IMDB, Yelp13, and Yelp14, show that our dual learning model gives
performance gain of 0.6%, 1.2%, and 0.9%, respectively. The improvements are
also deemed very significant measured by p-values.Comment: To appear in 2018 EMNLP 9th Workshop on Computational Approaches to
Subjectivity, Sentiment and Social Media Analysi
Agree to Disagree: Improving Disagreement Detection with Dual GRUs
This paper presents models for detecting agreement/disagreement in online
discussions. In this work we show that by using a Siamese inspired architecture
to encode the discussions, we no longer need to rely on hand-crafted features
to exploit the meta thread structure. We evaluate our model on existing online
discussion corpora - ABCD, IAC and AWTP. Experimental results on ABCD dataset
show that by fusing lexical and word embedding features, our model achieves the
state of the art performance of 0.804 average F1 score. We also show that the
model trained on ABCD dataset performs competitively on relatively smaller
annotated datasets (IAC and AWTP).Comment: In Proc. 7th Affective Computing and Intelligent Interaction
(ACII'17), San Antonio, TX, USA, Oct. 23-26, 201
Paraphrase Thought: Sentence Embedding Module Imitating Human Language Recognition
Sentence embedding is an important research topic in natural language
processing. It is essential to generate a good embedding vector that fully
reflects the semantic meaning of a sentence in order to achieve an enhanced
performance for various natural language processing tasks, such as machine
translation and document classification. Thus far, various sentence embedding
models have been proposed, and their feasibility has been demonstrated through
good performances on tasks following embedding, such as sentiment analysis and
sentence classification. However, because the performances of sentence
classification and sentiment analysis can be enhanced by using a simple
sentence representation method, it is not sufficient to claim that these models
fully reflect the meanings of sentences based on good performances for such
tasks. In this paper, inspired by human language recognition, we propose the
following concept of semantic coherence, which should be satisfied for a good
sentence embedding method: similar sentences should be located close to each
other in the embedding space. Then, we propose the Paraphrase-Thought
(P-thought) model to pursue semantic coherence as much as possible.
Experimental results on two paraphrase identification datasets (MS COCO and STS
benchmark) show that the P-thought models outperform the benchmarked sentence
embedding methods.Comment: 10 page
- …