6,634 research outputs found
Generating Aspect-oriented Multi-document Summarization with Event-Aspect Model
In this paper, we propose a novel approach to automatic generation of aspect-oriented summaries from multiple documents. We first develop an event-aspect LDA model to cluster sentences into aspects. We then use extended LexRank algorithm to rank the sentences in each cluster. We use Integer Linear Programming for sentence selection. Key features of our method include automatic grouping of semantically related sentences and sentence ranking based on extension of random walk model. Also, we implement a new sentence compression algorithm which use dependency tree instead of parser tree. We compare our method with four baseline methods. Quantitative evaluation based on Rouge metric demonstrates the effectiveness and advantages of our method.
Growing Story Forest Online from Massive Breaking News
We describe our experience of implementing a news content organization system
at Tencent that discovers events from vast streams of breaking news and evolves
news story structures in an online fashion. Our real-world system has distinct
requirements in contrast to previous studies on topic detection and tracking
(TDT) and event timeline or graph generation, in that we 1) need to accurately
and quickly extract distinguishable events from massive streams of long text
documents that cover diverse topics and contain highly redundant information,
and 2) must develop the structures of event stories in an online manner,
without repeatedly restructuring previously formed stories, in order to
guarantee a consistent user viewing experience. In solving these challenges, we
propose Story Forest, a set of online schemes that automatically clusters
streaming documents into events, while connecting related events in growing
trees to tell evolving stories. We conducted extensive evaluation based on 60
GB of real-world Chinese news data, although our ideas are not
language-dependent and can easily be extended to other languages, through
detailed pilot user experience studies. The results demonstrate the superior
capability of Story Forest to accurately identify events and organize news text
into a logical structure that is appealing to human readers, compared to
multiple existing algorithm frameworks.Comment: Accepted by CIKM 2017, 9 page
A Survey on Event-based News Narrative Extraction
Narratives are fundamental to our understanding of the world, providing us
with a natural structure for knowledge representation over time. Computational
narrative extraction is a subfield of artificial intelligence that makes heavy
use of information retrieval and natural language processing techniques.
Despite the importance of computational narrative extraction, relatively little
scholarly work exists on synthesizing previous research and strategizing future
research in the area. In particular, this article focuses on extracting news
narratives from an event-centric perspective. Extracting narratives from news
data has multiple applications in understanding the evolving information
landscape. This survey presents an extensive study of research in the area of
event-based news narrative extraction. In particular, we screened over 900
articles that yielded 54 relevant articles. These articles are synthesized and
organized by representation model, extraction criteria, and evaluation
approaches. Based on the reviewed studies, we identify recent trends, open
challenges, and potential research lines.Comment: 37 pages, 3 figures, to be published in the journal ACM CSU
Extracting Causal Relations between News Topics from Distributed Sources
The overwhelming amount of online news presents a challenge called news information overload. To mitigate this challenge we propose a system to generate a causal network of news topics. To extract this information from distributed news sources, a system called Forest was developed. Forest retrieves documents that potentially contain causal information regarding a news topic. The documents are processed at a sentence level to extract causal relations and news topic references, these are the phases used to refer to a news topic. Forest uses a machine learning approach to classify causal sentences, and then renders the potential cause and effect of the sentences. The potential cause and effect are then classified as news topic references, these are the phrases used to refer to a news topics, such as “The World Cup” or “The Financial Meltdown”. Both classifiers use an algorithm developed within our working group, the algorithm performs better than several well known classification algorithms for the aforementioned tasks.
In our evaluations we found that participants consider causal information useful to understand the news, and that while we can not extract causal information for all news topics, it is highly likely that we can extract causal relation for the most popular news topics. To evaluate the accuracy of the extractions made by Forest, we completed a user survey. We found that by providing the top ranked results, we obtained a high accuracy in extracting causal relations between news topics
Programmable Insight: A Computational Methodology to Explore Online News Use of Frames
abstract: The Internet is a major source of online news content. Online news is a form of large-scale narrative text with rich, complex contents that embed deep meanings (facts, strategic communication frames, and biases) for shaping and transitioning standards, values, attitudes, and beliefs of the masses. Currently, this body of narrative text remains untapped due—in large part—to human limitations. The human ability to comprehend rich text and extract hidden meanings is far superior to known computational algorithms but remains unscalable. In this research, computational treatment is given to online news framing for exposing a deeper level of expressivity coined “double subjectivity” as characterized by its cumulative amplification effects. A visual language is offered for extracting spatial and temporal dynamics of double subjectivity that may give insight into social influence about critical issues, such as environmental, economic, or political discourse. This research offers benefits of 1) scalability for processing hidden meanings in big data and 2) visibility of the entire network dynamics over time and space to give users insight into the current status and future trends of mass communication.Dissertation/ThesisDoctoral Dissertation Computer Science 201
- …