7 research outputs found
Growing Story Forest Online from Massive Breaking News
We describe our experience of implementing a news content organization system
at Tencent that discovers events from vast streams of breaking news and evolves
news story structures in an online fashion. Our real-world system has distinct
requirements in contrast to previous studies on topic detection and tracking
(TDT) and event timeline or graph generation, in that we 1) need to accurately
and quickly extract distinguishable events from massive streams of long text
documents that cover diverse topics and contain highly redundant information,
and 2) must develop the structures of event stories in an online manner,
without repeatedly restructuring previously formed stories, in order to
guarantee a consistent user viewing experience. In solving these challenges, we
propose Story Forest, a set of online schemes that automatically clusters
streaming documents into events, while connecting related events in growing
trees to tell evolving stories. We conducted extensive evaluation based on 60
GB of real-world Chinese news data, although our ideas are not
language-dependent and can easily be extended to other languages, through
detailed pilot user experience studies. The results demonstrate the superior
capability of Story Forest to accurately identify events and organize news text
into a logical structure that is appealing to human readers, compared to
multiple existing algorithm frameworks.Comment: Accepted by CIKM 2017, 9 page
A Cross-Attention Augmented Model for Event-Triggered Context-Aware Story Generation
Despite recent advancements, existing story generation systems continue to
encounter difficulties in effectively incorporating contextual and event
features, which greatly influence the quality of generated narratives. To
tackle these challenges, we introduce a novel neural generation model, EtriCA,
that enhances the relevance and coherence of generated stories by employing a
cross-attention mechanism to map context features onto event sequences through
residual mapping. This feature capturing mechanism enables our model to exploit
logical relationships between events more effectively during the story
generation process. To further enhance our proposed model, we employ a
post-training framework for knowledge enhancement (KeEtriCA) on a large-scale
book corpus. This allows EtriCA to adapt to a wider range of data samples. This
results in approximately 5\% improvement in automatic metrics and over 10\%
improvement in human evaluation. We conduct extensive experiments, including
comparisons with state-of-the-art (SOTA) baseline models, to evaluate the
performance of our framework on story generation. The experimental results,
encompassing both automated metrics and human assessments, demonstrate the
superiority of our model over existing state-of-the-art baselines. These
results underscore the effectiveness of our model in leveraging context and
event features to improve the quality of generated narratives.Comment: Submitted to CS
A Survey on Event-based News Narrative Extraction
Narratives are fundamental to our understanding of the world, providing us
with a natural structure for knowledge representation over time. Computational
narrative extraction is a subfield of artificial intelligence that makes heavy
use of information retrieval and natural language processing techniques.
Despite the importance of computational narrative extraction, relatively little
scholarly work exists on synthesizing previous research and strategizing future
research in the area. In particular, this article focuses on extracting news
narratives from an event-centric perspective. Extracting narratives from news
data has multiple applications in understanding the evolving information
landscape. This survey presents an extensive study of research in the area of
event-based news narrative extraction. In particular, we screened over 900
articles that yielded 54 relevant articles. These articles are synthesized and
organized by representation model, extraction criteria, and evaluation
approaches. Based on the reviewed studies, we identify recent trends, open
challenges, and potential research lines.Comment: 37 pages, 3 figures, to be published in the journal ACM CSU
Proceedings of the First Workshop on Computing News Storylines (CNewsStory 2015)
This volume contains the proceedings of the 1st Workshop on Computing News Storylines (CNewsStory
2015) held in conjunction with the 53rd Annual Meeting of the Association for Computational
Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP
2015) at the China National Convention Center in Beijing, on July 31st 2015.
Narratives are at the heart of information sharing. Ever since people began to share their experiences,
they have connected them to form narratives. The study od storytelling and the field of literary theory
called narratology have developed complex frameworks and models related to various aspects of
narrative such as plots structures, narrative embeddings, characters’ perspectives, reader response, point
of view, narrative voice, narrative goals, and many others. These notions from narratology have been
applied mainly in Artificial Intelligence and to model formal semantic approaches to narratives (e.g.
Plot Units developed by Lehnert (1981)). In recent years, computational narratology has qualified as an
autonomous field of study and research. Narrative has been the focus of a number of workshops and
conferences (AAAI Symposia, Interactive Storytelling Conference (ICIDS), Computational Models of
Narrative). Furthermore, reference annotation schemes for narratives have been proposed (NarrativeML
by Mani (2013)).
The workshop aimed at bringing together researchers from different communities working on
representing and extracting narrative structures in news, a text genre which is highly used in NLP
but which has received little attention with respect to narrative structure, representation and analysis.
Currently, advances in NLP technology have made it feasible to look beyond scenario-driven, atomic
extraction of events from single documents and work towards extracting story structures from multiple
documents, while these documents are published over time as news streams. Policy makers, NGOs,
information specialists (such as journalists and librarians) and others are increasingly in need of tools
that support them in finding salient stories in large amounts of information to more effectively implement
policies, monitor actions of “big players” in the society and check facts. Their tasks often revolve around
reconstructing cases either with respect to specific entities (e.g. person or organizations) or events (e.g.
hurricane Katrina). Storylines represent explanatory schemas that enable us to make better selections
of relevant information but also projections to the future. They form a valuable potential for exploiting
news data in an innovative way.JRC.G.2-Global security and crisis managemen
Understanding the topics and opinions from social media content
Social media has become one indispensable part of people’s daily life, as it records and reflects people’s opinions and events of interest, as well as influences people’s perceptions. As the most commonly employed and easily accessed data format on social media, a great deal of the social media textual content is not only factual and objective, but also rich in opinionated information. Thus, besides the topics Internet users are talking about in social media textual content, it is also of great importance to understand the opinions they are expressing. In this thesis, I present my broadly applicable text mining approaches, in order to understand the topics and opinions of user-generated texts on social media, to provide insights about the thoughts of Internet users on entities, events, etc. Specifically, I develop approaches to understand the semantic differences between language-specific editions of Wikipedia, when discussing certain entities from the related topical aspects perspective and the aggregated sentiment bias perspective. Moreover, I employ effective features to detect the reputation-influential sentences for person and company entities in Wikipedia articles, which lead to the detected sentiment bias. Furthermore, I propose neural network models with different levels of attention mechanism, to detect the stances of tweets towards any given target. I also introduce an online timeline generation approach, to detect and summarise the relevant sub-topics in the tweet stream, in order to provide Internet users with some insights about the evolution of major events they are interested in