6 research outputs found
Timeline Generation: Tracking individuals on Twitter
In this paper, we propose a unsupervised framework to reconstruct a person's
life history by creating a chronological list for {\it personal important
events} (PIE) of individuals based on the tweets they published. By analyzing
individual tweet collections, we find that what are suitable for inclusion in
the personal timeline should be tweets talking about personal (as opposed to
public) and time-specific (as opposed to time-general) topics. To further
extract these types of topics, we introduce a non-parametric multi-level
Dirichlet Process model to recognize four types of tweets: personal
time-specific (PersonTS), personal time-general (PersonTG), public
time-specific (PublicTS) and public time-general (PublicTG) topics, which, in
turn, are used for further personal event extraction and timeline generation.
To the best of our knowledge, this is the first work focused on the generation
of timeline for individuals from twitter data. For evaluation, we have built a
new golden standard Timelines based on Twitter and Wikipedia that contain PIE
related events from 20 {\it ordinary twitter users} and 20 {\it celebrities}.
Experiments on real Twitter data quantitatively demonstrate the effectiveness
of our approach
A topic modeling based approach to novel document automatic summarization
© 2017 Elsevier Ltd Most of existing text automatic summarization algorithms are targeted for multi-documents of relatively short length, thus difficult to be applied immediately to novel documents of structure freedom and long length. In this paper, aiming at novel documents, we propose a topic modeling based approach to extractive automatic summarization, so as to achieve a good balance among compression ratio, summarization quality and machine readability. First, based on topic modeling, we extract the candidate sentences associated with topic words from a preprocessed novel document. Second, with the goals of compression ratio and topic diversity, we design an importance evaluation function to select the most important sentences from the candidate sentences and thus generate an initial novel summary. Finally, we smooth the initial summary to overcome the semantic confusion caused by ambiguous or synonymous words, so as to improve the summary readability. We evaluate experimentally our proposed approach on a real novel dataset. The experiment results show that compared to those from other candidate algorithms, each automatic summary generated by our approach has not only a higher compression ratio, but also better summarization quality
A Survey on Event-based News Narrative Extraction
Narratives are fundamental to our understanding of the world, providing us
with a natural structure for knowledge representation over time. Computational
narrative extraction is a subfield of artificial intelligence that makes heavy
use of information retrieval and natural language processing techniques.
Despite the importance of computational narrative extraction, relatively little
scholarly work exists on synthesizing previous research and strategizing future
research in the area. In particular, this article focuses on extracting news
narratives from an event-centric perspective. Extracting narratives from news
data has multiple applications in understanding the evolving information
landscape. This survey presents an extensive study of research in the area of
event-based news narrative extraction. In particular, we screened over 900
articles that yielded 54 relevant articles. These articles are synthesized and
organized by representation model, extraction criteria, and evaluation
approaches. Based on the reviewed studies, we identify recent trends, open
challenges, and potential research lines.Comment: 37 pages, 3 figures, to be published in the journal ACM CSU
Evolutionary Hierarchical Dirichlet Process for Timeline Summarization
Timeline summarization aims at generating concise summaries and giving readers a faster and better access to understand the evolution of news. It is a new challenge which combines salience ranking problem with novelty detection. Previous researches in this field seldom explore the evolutionary pattern of topics such as birth, splitting, merging, developing and death. In this paper, we develop a novel model called Evolutionary Hierarchical Dirichlet Process(EHDP) to capture the topic evolution pattern in timeline summarization. In EHDP, time varying information is formulated as a series of HDPs by considering time-dependent information. Experiments on 6 different datasets which contain 3156 documents demonstrates the good performance of our system with regard to ROUGE scores.