7 research outputs found
Abstractive Opinion Tagging
In e-commerce, opinion tags refer to a ranked list of tags provided by the
e-commerce platform that reflect characteristics of reviews of an item. To
assist consumers to quickly grasp a large number of reviews about an item,
opinion tags are increasingly being applied by e-commerce platforms. Current
mechanisms for generating opinion tags rely on either manual labelling or
heuristic methods, which is time-consuming and ineffective. In this paper, we
propose the abstractive opinion tagging task, where systems have to
automatically generate a ranked list of opinion tags that are based on, but
need not occur in, a given set of user-generated reviews.
The abstractive opinion tagging task comes with three main challenges: (1)
the noisy nature of reviews; (2) the formal nature of opinion tags vs. the
colloquial language usage in reviews; and (3) the need to distinguish between
different items with very similar aspects. To address these challenges, we
propose an abstractive opinion tagging framework, named AOT-Net, to generate a
ranked list of opinion tags given a large number of reviews. First, a
sentence-level salience estimation component estimates each review's salience
score. Next, a review clustering and ranking component ranks reviews in two
steps: first, reviews are grouped into clusters and ranked by cluster size;
then, reviews within each cluster are ranked by their distance to the cluster
center. Finally, given the ranked reviews, a rank-aware opinion tagging
component incorporates an alignment feature and alignment loss to generate a
ranked list of opinion tags. To facilitate the study of this task, we create
and release a large-scale dataset, called eComTag, crawled from real-world
e-commerce websites. Extensive experiments conducted on the eComTag dataset
verify the effectiveness of the proposed AOT-Net in terms of various evaluation
metrics.Comment: Accepted by WSDM 202
A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss
Acquiring accurate summarization and sentiment from user reviews is an
essential component of modern e-commerce platforms. Review summarization aims
at generating a concise summary that describes the key opinions and sentiment
of a review, while sentiment classification aims to predict a sentiment label
indicating the sentiment attitude of a review. To effectively leverage the
shared sentiment information in both review summarization and sentiment
classification tasks, we propose a novel dual-view model that jointly improves
the performance of these two tasks. In our model, an encoder first learns a
context representation for the review, then a summary decoder generates a
review summary word by word. After that, a source-view sentiment classifier
uses the encoded context representation to predict a sentiment label for the
review, while a summary-view sentiment classifier uses the decoder hidden
states to predict a sentiment label for the generated summary. During training,
we introduce an inconsistency loss to penalize the disagreement between these
two classifiers. It helps the decoder to generate a summary to have a
consistent sentiment tendency with the review and also helps the two sentiment
classifiers learn from each other. Experiment results on four real-world
datasets from different domains demonstrate the effectiveness of our model.Comment: Accepted by SIGIR 2020. Updated the results of balanced accuracy
scores in Table 3 since we found a bug in our source code. Nevertheless, our
model still achieves higher balanced accuracy scores than the baselines after
we fixed this bu
A topic modeling based approach to novel document automatic summarization
© 2017 Elsevier Ltd Most of existing text automatic summarization algorithms are targeted for multi-documents of relatively short length, thus difficult to be applied immediately to novel documents of structure freedom and long length. In this paper, aiming at novel documents, we propose a topic modeling based approach to extractive automatic summarization, so as to achieve a good balance among compression ratio, summarization quality and machine readability. First, based on topic modeling, we extract the candidate sentences associated with topic words from a preprocessed novel document. Second, with the goals of compression ratio and topic diversity, we design an importance evaluation function to select the most important sentences from the candidate sentences and thus generate an initial novel summary. Finally, we smooth the initial summary to overcome the semantic confusion caused by ambiguous or synonymous words, so as to improve the summary readability. We evaluate experimentally our proposed approach on a real novel dataset. The experiment results show that compared to those from other candidate algorithms, each automatic summary generated by our approach has not only a higher compression ratio, but also better summarization quality
A Novel ILP Framework for Summarizing Content with High Lexical Variety
Summarizing content contributed by individuals can be challenging, because
people make different lexical choices even when describing the same events.
However, there remains a significant need to summarize such content. Examples
include the student responses to post-class reflective questions, product
reviews, and news articles published by different news agencies related to the
same events. High lexical diversity of these documents hinders the system's
ability to effectively identify salient content and reduce summary redundancy.
In this paper, we overcome this issue by introducing an integer linear
programming-based summarization framework. It incorporates a low-rank
approximation to the sentence-word co-occurrence matrix to intrinsically group
semantically-similar lexical items. We conduct extensive experiments on
datasets of student responses, product reviews, and news documents. Our
approach compares favorably to a number of extractive baselines as well as a
neural abstractive summarization system. The paper finally sheds light on when
and why the proposed framework is effective at summarizing content with high
lexical variety.Comment: Accepted for publication in the journal of Natural Language
Engineering, 201
Helpfulness Guided Review Summarization
User-generated online reviews are an important information resource in people's everyday life. As the review volume grows explosively, the ability to automatically identify and summarize useful information from reviews becomes essential in providing analytic services in many review-based applications. While prior work on review summarization focused on different review perspectives (e.g. topics, opinions, sentiment, etc.), the helpfulness of reviews is an important informativeness indicator that has been less frequently explored. In this thesis, we investigate automatic review helpfulness prediction and exploit review helpfulness for review summarization in distinct review domains.
We explore two paths for predicting review helpfulness in a general setting: one is by tailoring existing helpfulness prediction techniques to a new review domain; the other is by using a general representation of review content that reflects review helpfulness across domains. For the first one, we explore educational peer reviews and show how peer-review domain knowledge can be introduced to a helpfulness model developed for product reviews to improve prediction performance. For the second one, we characterize review language usage, content diversity and helpfulness-related topics with respect to different content sources using computational linguistic features.
For review summarization, we propose to leverage user-provided helpfulness assessment during content selection in two ways: 1) using the review-level helpfulness ratings directly to filter out unhelpful reviews, 2) developing sentence-level helpfulness features via supervised topic modeling for sentence selection. As a demonstration, we implement our methods based on an extractive multi-document summarization framework and evaluate them in three user studies. Results show that our helpfulness-guided summarizers outperform the baseline in both human and automated evaluation for camera reviews and movie reviews. While for educational peer reviews, the preference for helpfulness depends on student writing performance and prior teaching experience