6 research outputs found

    Timeline Generation: Tracking individuals on Twitter

    Full text link
    In this paper, we propose a unsupervised framework to reconstruct a person's life history by creating a chronological list for {\it personal important events} (PIE) of individuals based on the tweets they published. By analyzing individual tweet collections, we find that what are suitable for inclusion in the personal timeline should be tweets talking about personal (as opposed to public) and time-specific (as opposed to time-general) topics. To further extract these types of topics, we introduce a non-parametric multi-level Dirichlet Process model to recognize four types of tweets: personal time-specific (PersonTS), personal time-general (PersonTG), public time-specific (PublicTS) and public time-general (PublicTG) topics, which, in turn, are used for further personal event extraction and timeline generation. To the best of our knowledge, this is the first work focused on the generation of timeline for individuals from twitter data. For evaluation, we have built a new golden standard Timelines based on Twitter and Wikipedia that contain PIE related events from 20 {\it ordinary twitter users} and 20 {\it celebrities}. Experiments on real Twitter data quantitatively demonstrate the effectiveness of our approach

    A topic modeling based approach to novel document automatic summarization

    Full text link
    © 2017 Elsevier Ltd Most of existing text automatic summarization algorithms are targeted for multi-documents of relatively short length, thus difficult to be applied immediately to novel documents of structure freedom and long length. In this paper, aiming at novel documents, we propose a topic modeling based approach to extractive automatic summarization, so as to achieve a good balance among compression ratio, summarization quality and machine readability. First, based on topic modeling, we extract the candidate sentences associated with topic words from a preprocessed novel document. Second, with the goals of compression ratio and topic diversity, we design an importance evaluation function to select the most important sentences from the candidate sentences and thus generate an initial novel summary. Finally, we smooth the initial summary to overcome the semantic confusion caused by ambiguous or synonymous words, so as to improve the summary readability. We evaluate experimentally our proposed approach on a real novel dataset. The experiment results show that compared to those from other candidate algorithms, each automatic summary generated by our approach has not only a higher compression ratio, but also better summarization quality

    A Survey on Event-based News Narrative Extraction

    Full text link
    Narratives are fundamental to our understanding of the world, providing us with a natural structure for knowledge representation over time. Computational narrative extraction is a subfield of artificial intelligence that makes heavy use of information retrieval and natural language processing techniques. Despite the importance of computational narrative extraction, relatively little scholarly work exists on synthesizing previous research and strategizing future research in the area. In particular, this article focuses on extracting news narratives from an event-centric perspective. Extracting narratives from news data has multiple applications in understanding the evolving information landscape. This survey presents an extensive study of research in the area of event-based news narrative extraction. In particular, we screened over 900 articles that yielded 54 relevant articles. These articles are synthesized and organized by representation model, extraction criteria, and evaluation approaches. Based on the reviewed studies, we identify recent trends, open challenges, and potential research lines.Comment: 37 pages, 3 figures, to be published in the journal ACM CSU

    Evolutionary Hierarchical Dirichlet Process for Timeline Summarization

    No full text
    Timeline summarization aims at generating concise summaries and giving readers a faster and better access to understand the evolution of news. It is a new challenge which combines salience ranking problem with novelty detection. Previous researches in this field seldom explore the evolutionary pattern of topics such as birth, splitting, merging, developing and death. In this paper, we develop a novel model called Evolutionary Hierarchical Dirichlet Process(EHDP) to capture the topic evolution pattern in timeline summarization. In EHDP, time varying information is formulated as a series of HDPs by considering time-dependent information. Experiments on 6 different datasets which contain 3156 documents demonstrates the good performance of our system with regard to ROUGE scores.
    corecore