7 research outputs found

    Abstractive Opinion Tagging

    Full text link
    In e-commerce, opinion tags refer to a ranked list of tags provided by the e-commerce platform that reflect characteristics of reviews of an item. To assist consumers to quickly grasp a large number of reviews about an item, opinion tags are increasingly being applied by e-commerce platforms. Current mechanisms for generating opinion tags rely on either manual labelling or heuristic methods, which is time-consuming and ineffective. In this paper, we propose the abstractive opinion tagging task, where systems have to automatically generate a ranked list of opinion tags that are based on, but need not occur in, a given set of user-generated reviews. The abstractive opinion tagging task comes with three main challenges: (1) the noisy nature of reviews; (2) the formal nature of opinion tags vs. the colloquial language usage in reviews; and (3) the need to distinguish between different items with very similar aspects. To address these challenges, we propose an abstractive opinion tagging framework, named AOT-Net, to generate a ranked list of opinion tags given a large number of reviews. First, a sentence-level salience estimation component estimates each review's salience score. Next, a review clustering and ranking component ranks reviews in two steps: first, reviews are grouped into clusters and ranked by cluster size; then, reviews within each cluster are ranked by their distance to the cluster center. Finally, given the ranked reviews, a rank-aware opinion tagging component incorporates an alignment feature and alignment loss to generate a ranked list of opinion tags. To facilitate the study of this task, we create and release a large-scale dataset, called eComTag, crawled from real-world e-commerce websites. Extensive experiments conducted on the eComTag dataset verify the effectiveness of the proposed AOT-Net in terms of various evaluation metrics.Comment: Accepted by WSDM 202

    A Unified Dual-view Model for Review Summarization and Sentiment Classification with Inconsistency Loss

    Full text link
    Acquiring accurate summarization and sentiment from user reviews is an essential component of modern e-commerce platforms. Review summarization aims at generating a concise summary that describes the key opinions and sentiment of a review, while sentiment classification aims to predict a sentiment label indicating the sentiment attitude of a review. To effectively leverage the shared sentiment information in both review summarization and sentiment classification tasks, we propose a novel dual-view model that jointly improves the performance of these two tasks. In our model, an encoder first learns a context representation for the review, then a summary decoder generates a review summary word by word. After that, a source-view sentiment classifier uses the encoded context representation to predict a sentiment label for the review, while a summary-view sentiment classifier uses the decoder hidden states to predict a sentiment label for the generated summary. During training, we introduce an inconsistency loss to penalize the disagreement between these two classifiers. It helps the decoder to generate a summary to have a consistent sentiment tendency with the review and also helps the two sentiment classifiers learn from each other. Experiment results on four real-world datasets from different domains demonstrate the effectiveness of our model.Comment: Accepted by SIGIR 2020. Updated the results of balanced accuracy scores in Table 3 since we found a bug in our source code. Nevertheless, our model still achieves higher balanced accuracy scores than the baselines after we fixed this bu

    A topic modeling based approach to novel document automatic summarization

    Full text link
    © 2017 Elsevier Ltd Most of existing text automatic summarization algorithms are targeted for multi-documents of relatively short length, thus difficult to be applied immediately to novel documents of structure freedom and long length. In this paper, aiming at novel documents, we propose a topic modeling based approach to extractive automatic summarization, so as to achieve a good balance among compression ratio, summarization quality and machine readability. First, based on topic modeling, we extract the candidate sentences associated with topic words from a preprocessed novel document. Second, with the goals of compression ratio and topic diversity, we design an importance evaluation function to select the most important sentences from the candidate sentences and thus generate an initial novel summary. Finally, we smooth the initial summary to overcome the semantic confusion caused by ambiguous or synonymous words, so as to improve the summary readability. We evaluate experimentally our proposed approach on a real novel dataset. The experiment results show that compared to those from other candidate algorithms, each automatic summary generated by our approach has not only a higher compression ratio, but also better summarization quality

    A Novel ILP Framework for Summarizing Content with High Lexical Variety

    Full text link
    Summarizing content contributed by individuals can be challenging, because people make different lexical choices even when describing the same events. However, there remains a significant need to summarize such content. Examples include the student responses to post-class reflective questions, product reviews, and news articles published by different news agencies related to the same events. High lexical diversity of these documents hinders the system's ability to effectively identify salient content and reduce summary redundancy. In this paper, we overcome this issue by introducing an integer linear programming-based summarization framework. It incorporates a low-rank approximation to the sentence-word co-occurrence matrix to intrinsically group semantically-similar lexical items. We conduct extensive experiments on datasets of student responses, product reviews, and news documents. Our approach compares favorably to a number of extractive baselines as well as a neural abstractive summarization system. The paper finally sheds light on when and why the proposed framework is effective at summarizing content with high lexical variety.Comment: Accepted for publication in the journal of Natural Language Engineering, 201

    Helpfulness Guided Review Summarization

    Get PDF
    User-generated online reviews are an important information resource in people's everyday life. As the review volume grows explosively, the ability to automatically identify and summarize useful information from reviews becomes essential in providing analytic services in many review-based applications. While prior work on review summarization focused on different review perspectives (e.g. topics, opinions, sentiment, etc.), the helpfulness of reviews is an important informativeness indicator that has been less frequently explored. In this thesis, we investigate automatic review helpfulness prediction and exploit review helpfulness for review summarization in distinct review domains. We explore two paths for predicting review helpfulness in a general setting: one is by tailoring existing helpfulness prediction techniques to a new review domain; the other is by using a general representation of review content that reflects review helpfulness across domains. For the first one, we explore educational peer reviews and show how peer-review domain knowledge can be introduced to a helpfulness model developed for product reviews to improve prediction performance. For the second one, we characterize review language usage, content diversity and helpfulness-related topics with respect to different content sources using computational linguistic features. For review summarization, we propose to leverage user-provided helpfulness assessment during content selection in two ways: 1) using the review-level helpfulness ratings directly to filter out unhelpful reviews, 2) developing sentence-level helpfulness features via supervised topic modeling for sentence selection. As a demonstration, we implement our methods based on an extractive multi-document summarization framework and evaluate them in three user studies. Results show that our helpfulness-guided summarizers outperform the baseline in both human and automated evaluation for camera reviews and movie reviews. While for educational peer reviews, the preference for helpfulness depends on student writing performance and prior teaching experience
    corecore