404 research outputs found
The Development of a Temporal Information Dictionary for Social Media Analytics
Dictionaries have been used to analyze text even before the emergence of social media and the use of dictionaries for sentiment analysis there. While dictionaries have been used to understand the tonality of text, so far it has not been possible to automatically detect if the tonality refers to the present, past, or future. In this research, we develop a dictionary containing time-indicating words in a wordlist (T-wordlist). To test how the dictionary performs, we apply our T-wordlist on different disaster related social media datasets. Subsequently we will validate the wordlist and results by a manual content analysis. So far, in this research-in-progress, we were able to develop a first dictionary and will also provide some initial insight into the performance of our wordlist
Explicit diversification of event aspects for temporal summarization
During major events, such as emergencies and disasters, a large volume of information is reported on newswire and social media platforms. Temporal summarization (TS) approaches are used to automatically produce concise overviews of such events by extracting text snippets from related articles over time. Current TS approaches rely on a combination of event relevance and textual novelty for snippet selection. However, for events that span multiple days, textual novelty is often a poor criterion for selecting snippets, since many snippets are textually unique but are semantically redundant or non-informative. In this article, we propose a framework for the diversification of snippets using explicit event aspects, building on recent works in search result diversification. In particular, we first propose two techniques to identify explicit aspects that a user might want to see covered in a summary for different types of event. We then extend a state-of-the-art explicit diversification framework to maximize the coverage of these aspects when selecting summary snippets for unseen events. Through experimentation over the TREC TS 2013, 2014, and 2015 datasets, we show that explicit diversification for temporal summarization significantly outperforms classical novelty-based diversification, as the use of explicit event aspects reduces the amount of redundant and off-topic snippets returned, while also increasing summary timeliness
A Survey on Event-based News Narrative Extraction
Narratives are fundamental to our understanding of the world, providing us
with a natural structure for knowledge representation over time. Computational
narrative extraction is a subfield of artificial intelligence that makes heavy
use of information retrieval and natural language processing techniques.
Despite the importance of computational narrative extraction, relatively little
scholarly work exists on synthesizing previous research and strategizing future
research in the area. In particular, this article focuses on extracting news
narratives from an event-centric perspective. Extracting narratives from news
data has multiple applications in understanding the evolving information
landscape. This survey presents an extensive study of research in the area of
event-based news narrative extraction. In particular, we screened over 900
articles that yielded 54 relevant articles. These articles are synthesized and
organized by representation model, extraction criteria, and evaluation
approaches. Based on the reviewed studies, we identify recent trends, open
challenges, and potential research lines.Comment: 37 pages, 3 figures, to be published in the journal ACM CSU
Resumen multidocumento utilizando teorÃas semántico-discursivas
El resumen automático tiene por objetivo reducir el tamaño de los textos, preservando el contenido más importante. En este trabajo, proponemos algunos métodos de resumen basados en dos teorÃas semántico-discursivas: TeorÃa de la Estructura Retórica (Rhetorical Structure Theory, RST) y TeorÃa de la Estructura Inter-Documento (Cross-document Structure Theory, CST). Han sido elegidas ambas teorÃas con el fin de abordar de un modo más relevante de un texto, los fenómenos relacionales de inter-documentos y la distribución de subtopicos en los textos. Los resultados muestran que el uso de informaciones semánticas y discursivas para la selección de contenidos mejora la capacidad informativa de los resúmenes automáticos.Automatic multi-document summarization aims at reducing the size of texts while preserving the important content. In this paper, we propose some methods for automatic summarization based on two semantic discourse models: Rhetorical Structure Theory (RST) and Cross-document Structure Theory (CST). These models are chosen in order to properly address the relevance of information, multi-document phenomena and subtopical distribution in the source texts. The results show that using semantic discourse knowledge for content selection improve the informativeness of automatic summaries
Drafting Event Schemas using Language Models
Past work has studied event prediction and event language modeling, sometimes
mediated through structured representations of knowledge in the form of event
schemas. Such schemas can lead to explainable predictions and forecasting of
unseen events given incomplete information. In this work, we look at the
process of creating such schemas to describe complex events. We use large
language models (LLMs) to draft schemas directly in natural language, which can
be further refined by human curators as necessary. Our focus is on whether we
can achieve sufficient diversity and recall of key events and whether we can
produce the schemas in a sufficiently descriptive style. We show that large
language models are able to achieve moderate recall against schemas taken from
two different datasets, with even better results when multiple prompts and
multiple samples are combined. Moreover, we show that textual entailment
methods can be used for both matching schemas to instances of events as well as
evaluating overlap between gold and predicted schemas. Our method paves the way
for easier distillation of event knowledge from large language model into
schemas
- …