2,082 research outputs found
A study on text-score disagreement in online reviews
In this paper, we focus on online reviews and employ artificial intelligence
tools, taken from the cognitive computing field, to help understanding the
relationships between the textual part of the review and the assigned numerical
score. We move from the intuitions that 1) a set of textual reviews expressing
different sentiments may feature the same score (and vice-versa); and 2)
detecting and analyzing the mismatches between the review content and the
actual score may benefit both service providers and consumers, by highlighting
specific factors of satisfaction (and dissatisfaction) in texts.
To prove the intuitions, we adopt sentiment analysis techniques and we
concentrate on hotel reviews, to find polarity mismatches therein. In
particular, we first train a text classifier with a set of annotated hotel
reviews, taken from the Booking website. Then, we analyze a large dataset, with
around 160k hotel reviews collected from Tripadvisor, with the aim of detecting
a polarity mismatch, indicating if the textual content of the review is in
line, or not, with the associated score.
Using well established artificial intelligence techniques and analyzing in
depth the reviews featuring a mismatch between the text polarity and the score,
we find that -on a scale of five stars- those reviews ranked with middle scores
include a mixture of positive and negative aspects.
The approach proposed here, beside acting as a polarity detector, provides an
effective selection of reviews -on an initial very large dataset- that may
allow both consumers and providers to focus directly on the review subset
featuring a text/score disagreement, which conveniently convey to the user a
summary of positive and negative features of the review target.Comment: This is the accepted version of the paper. The final version will be
published in the Journal of Cognitive Computation, available at Springer via
http://dx.doi.org/10.1007/s12559-017-9496-
Adaptive Semi-supervised Learning for Cross-domain Sentiment Classification
We consider the cross-domain sentiment classification problem, where a
sentiment classifier is to be learned from a source domain and to be
generalized to a target domain. Our approach explicitly minimizes the distance
between the source and the target instances in an embedded feature space. With
the difference between source and target minimized, we then exploit additional
information from the target domain by consolidating the idea of semi-supervised
learning, for which, we jointly employ two regularizations -- entropy
minimization and self-ensemble bootstrapping -- to incorporate the unlabeled
target data for classifier refinement. Our experimental results demonstrate
that the proposed approach can better leverage unlabeled data from the target
domain and achieve substantial improvements over baseline methods in various
experimental settings.Comment: Accepted to EMNLP201
Self-Supervised and Controlled Multi-Document Opinion Summarization
We address the problem of unsupervised abstractive summarization of
collections of user generated reviews with self-supervision and control. We
propose a self-supervised setup that considers an individual document as a
target summary for a set of similar documents. This setting makes training
simpler than previous approaches by relying only on standard log-likelihood
loss. We address the problem of hallucinations through the use of control
codes, to steer the generation towards more coherent and relevant
summaries.Finally, we extend the Transformer architecture to allow for multiple
reviews as input. Our benchmarks on two datasets against graph-based and recent
neural abstractive unsupervised models show that our proposed method generates
summaries with a superior quality and relevance.This is confirmed in our human
evaluation which focuses explicitly on the faithfulness of generated summaries
We also provide an ablation study, which shows the importance of the control
setup in controlling hallucinations and achieve high sentiment and topic
alignment of the summaries with the input reviews.Comment: 18 pages including 5 pages appendi
Task-specific Word Identification from Short Texts Using a Convolutional Neural Network
Task-specific word identification aims to choose the task-related words that
best describe a short text. Existing approaches require well-defined seed words
or lexical dictionaries (e.g., WordNet), which are often unavailable for many
applications such as social discrimination detection and fake review detection.
However, we often have a set of labeled short texts where each short text has a
task-related class label, e.g., discriminatory or non-discriminatory, specified
by users or learned by classification algorithms. In this paper, we focus on
identifying task-specific words and phrases from short texts by exploiting
their class labels rather than using seed words or lexical dictionaries. We
consider the task-specific word and phrase identification as feature learning.
We train a convolutional neural network over a set of labeled texts and use
score vectors to localize the task-specific words and phrases. Experimental
results on sentiment word identification show that our approach significantly
outperforms existing methods. We further conduct two case studies to show the
effectiveness of our approach. One case study on a crawled tweets dataset
demonstrates that our approach can successfully capture the
discrimination-related words/phrases. The other case study on fake review
detection shows that our approach can identify the fake-review words/phrases.Comment: accepted by Intelligent Data Analysis, an International Journa
Domain-Adversarial Training of Neural Networks
We introduce a new representation learning approach for domain adaptation, in
which data at training and test time come from similar but different
distributions. Our approach is directly inspired by the theory on domain
adaptation suggesting that, for effective domain transfer to be achieved,
predictions must be made based on features that cannot discriminate between the
training (source) and test (target) domains. The approach implements this idea
in the context of neural network architectures that are trained on labeled data
from the source domain and unlabeled data from the target domain (no labeled
target-domain data is necessary). As the training progresses, the approach
promotes the emergence of features that are (i) discriminative for the main
learning task on the source domain and (ii) indiscriminate with respect to the
shift between the domains. We show that this adaptation behaviour can be
achieved in almost any feed-forward model by augmenting it with few standard
layers and a new gradient reversal layer. The resulting augmented architecture
can be trained using standard backpropagation and stochastic gradient descent,
and can thus be implemented with little effort using any of the deep learning
packages. We demonstrate the success of our approach for two distinct
classification problems (document sentiment analysis and image classification),
where state-of-the-art domain adaptation performance on standard benchmarks is
achieved. We also validate the approach for descriptor learning task in the
context of person re-identification application.Comment: Published in JMLR: http://jmlr.org/papers/v17/15-239.htm
- …