2,495 research outputs found

    Comprehensive Review of Opinion Summarization

    Get PDF
    The abundance of opinions on the web has kindled the study of opinion summarization over the last few years. People have introduced various techniques and paradigms to solving this special task. This survey attempts to systematically investigate the different techniques and approaches used in opinion summarization. We provide a multi-perspective classification of the approaches used and highlight some of the key weaknesses of these approaches. This survey also covers evaluation techniques and data sets used in studying the opinion summarization problem. Finally, we provide insights into some of the challenges that are left to be addressed as this will help set the trend for future research in this area.unpublishednot peer reviewe

    Community Interest as An Indicator for Ranking

    Get PDF
    Ranking documents in response to users\u27 information needs is a challenging task, due, in part, to the dynamic nature of users\u27 interests with respect to a query. We hypothesize that the interests of a given user are similar to the interests of the broader community of which he or she is a part and propose an innovative method that uses social media to characterize the interests of the community and use this characterization to improve future rankings. By generating a community interest vector (CIV) and community interest language model (CILM) for a given query, we use community interest to alter the ranking score of individual documents retrieved by the query. The CIV or CILM is based on a continuously updated set of recent (daily or past few hours) user oriented text data. The interest based ranking method is evaluated by using Amazon Turk to against relevance based ranking and search engines\u27 ranking results. Overall, the experiment result shows community interest is an effective indicator for dynamic ranking

    Text Summarization Techniques: A Brief Survey

    Get PDF
    In recent years, there has been a explosion in the amount of text data from a variety of sources. This volume of text is an invaluable source of information and knowledge which needs to be effectively summarized to be useful. In this review, the main approaches to automatic text summarization are described. We review the different processes for summarization and describe the effectiveness and shortcomings of the different methods.Comment: Some of references format have update

    Dirichlet belief networks for topic structure learning

    Full text link
    Recently, considerable research effort has been devoted to developing deep architectures for topic models to learn topic structures. Although several deep models have been proposed to learn better topic proportions of documents, how to leverage the benefits of deep structures for learning word distributions of topics has not yet been rigorously studied. Here we propose a new multi-layer generative process on word distributions of topics, where each layer consists of a set of topics and each topic is drawn from a mixture of the topics of the layer above. As the topics in all layers can be directly interpreted by words, the proposed model is able to discover interpretable topic hierarchies. As a self-contained module, our model can be flexibly adapted to different kinds of topic models to improve their modelling accuracy and interpretability. Extensive experiments on text corpora demonstrate the advantages of the proposed model.Comment: accepted in NIPS 201

    Unsupervised keyword extraction from microblog posts via hashtags

    Full text link
    © River Publishers. Nowadays, huge amounts of texts are being generated for social networking purposes on Web. Keyword extraction from such texts like microblog posts benefits many applications such as advertising, search, and content filtering. Unlike traditional web pages, a microblog post usually has some special social feature like a hashtag that is topical in nature and generated by users. Extracting keywords related to hashtags can reflect the intents of users and thus provides us better understanding on post content. In this paper, we propose a novel unsupervised keyword extraction approach for microblog posts by treating hashtags as topical indicators. Our approach consists of two hashtag enhanced algorithms. One is a topic model algorithm that infers topic distributions biased to hashtags on a collection of microblog posts. The words are ranked by their average topic probabilities. Our topic model algorithm can not only find the topics of a collection, but also extract hashtag-related keywords. The other is a random walk based algorithm. It first builds a word-post weighted graph by taking into account posts themselves. Then, a hashtag biased random walk is applied on this graph, which guides the algorithm to extract keywords according to hashtag topics. Last, the final ranking score of a word is determined by the stationary probability after a number of iterations. We evaluate our proposed approach on a collection of real Chinese microblog posts. Experiments show that our approach is more effective in terms of precision than traditional approaches considering no hashtag. The result achieved by the combination of two algorithms performs even better than each individual algorithm
    • …
    corecore