1,100 research outputs found

    Comprehensive Review of Opinion Summarization

    Get PDF
    The abundance of opinions on the web has kindled the study of opinion summarization over the last few years. People have introduced various techniques and paradigms to solving this special task. This survey attempts to systematically investigate the different techniques and approaches used in opinion summarization. We provide a multi-perspective classification of the approaches used and highlight some of the key weaknesses of these approaches. This survey also covers evaluation techniques and data sets used in studying the opinion summarization problem. Finally, we provide insights into some of the challenges that are left to be addressed as this will help set the trend for future research in this area.unpublishednot peer reviewe

    Turning Unstructured and Incoherent Group Discussion into DATree: A TBL Coherence Analysis Approach

    Get PDF
    Despite the rapid growth of user-generated unstructured text from online group discussions, business decision-makers are facing the challenge of understanding its highly incoherent content. Coherence analysis attempts to reconstruct the order of discussion messages. However, existing methods only focus on system and cohesion features. While they work with asynchronous discussions, they fail with synchronous discussions because these features rarely appear. We believe that discussion logic features play an important role in coherence analysis. Therefore, we propose a TCA method for coherence analysis, which is composed of a novel message similarity measure algorithm, a subtopic segmentation algorithm and a TBL-based classification algorithm. System, cohesion and discussion logic features are all incorporated into our TCA method. Results from experiments showed that the TCA method achieved significantly better performance than existing methods. Furthermore, we illustrate that the DATree generated by the TCA method can enhance decision-makers’ content analysis capability

    Harmony and dissonance: organizing the people's voices on political controversies

    Get PDF
    The wikileaks documents about the death of Osama Bin Laden and the debates about the economic crisis in Greece and other European countries are some of the controversial topics being played on the news everyday. Each of these topics has many different aspects, and there is no absolute, simple truth in answering questions such as: should the EU guarantee the financial stability of each member country, or should the countries themselves be solely responsible? To understand the landscape of opinions, it would be helpful to know which politician or other stakeholder takes which position-support or opposition-on these aspects of controversial topics

    LexRank: Graph-based Lexical Centrality as Salience in Text Summarization

    Full text link
    We introduce a stochastic graph-based method for computing relative importance of textual units for Natural Language Processing. We test the technique on the problem of Text Summarization (TS). Extractive TS relies on the concept of sentence salience to identify the most important sentences in a document or set of documents. Salience is typically defined in terms of the presence of particular important words or in terms of similarity to a centroid pseudo-sentence. We consider a new approach, LexRank, for computing sentence importance based on the concept of eigenvector centrality in a graph representation of sentences. In this model, a connectivity matrix based on intra-sentence cosine similarity is used as the adjacency matrix of the graph representation of sentences. Our system, based on LexRank ranked in first place in more than one task in the recent DUC 2004 evaluation. In this paper we present a detailed analysis of our approach and apply it to a larger data set including data from earlier DUC evaluations. We discuss several methods to compute centrality using the similarity graph. The results show that degree-based methods (including LexRank) outperform both centroid-based methods and other systems participating in DUC in most of the cases. Furthermore, the LexRank with threshold method outperforms the other degree-based techniques including continuous LexRank. We also show that our approach is quite insensitive to the noise in the data that may result from an imperfect topical clustering of documents

    A Query Focused Multi Document Automatic Summarization

    Get PDF

    NLP Driven Models for Automatically Generating Survey Articles for Scientific Topics.

    Full text link
    This thesis presents new methods that use natural language processing (NLP) driven models for summarizing research in scientific fields. Given a topic query in the form of a text string, we present methods for finding research articles relevant to the topic as well as summarization algorithms that use lexical and discourse information present in the text of these articles to generate coherent and readable extractive summaries of past research on the topic. In addition to summarizing prior research, good survey articles should also forecast future trends. With this motivation, we present work on forecasting future impact of scientific publications using NLP driven features.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113407/1/rahuljha_1.pd

    Controversy trend detection in social media

    Get PDF
    In this research, we focus on the early prediction of whether topics are likely to generate significant controversy (in the form of social media such as comments, blogs, etc.). Controversy trend detection is important to companies, governments, national security agencies, and marketing groups because it can be used to identify which issues the public is having problems with and develop strategies to remedy them. For example, companies can monitor their press release to find out how the public is reacting and to decide if any additional public relations action is required, social media moderators can moderate discussions if the discussions start becoming abusive and getting out of control, and governmental agencies can monitor their public policies and make adjustments to the policies to address any public concerns. An algorithm was developed to predict controversy trends by taking into account sentiment expressed in comments, burstiness of comments, and controversy score. To train and test the algorithm, an annotated corpus was developed consisting of 728 news articles and over 500,000 comments on these articles made by viewers from CNN.com. This study achieved an average F-score of 71.3% across all time spans in detection of controversial versus non-controversial topics. The results suggest that it is possible for early prediction of controversy trends leveraging social media
    • …
    corecore