12 research outputs found
Video Timeline Modeling For News Story Understanding
In this paper, we present a novel problem, namely video timeline modeling.
Our objective is to create a video-associated timeline from a set of videos
related to a specific topic, thereby facilitating the content and structure
understanding of the story being told. This problem has significant potential
in various real-world applications, such as news story summarization. To
bootstrap research in this area, we curate a realistic benchmark dataset,
YouTube-News-Timeline, consisting of over k timelines and k YouTube
news videos. Additionally, we propose a set of quantitative metrics as the
protocol to comprehensively evaluate and compare methodologies. With such a
testbed, we further develop and benchmark exploratory deep learning approaches
to tackle this problem. We anticipate that this exploratory work will pave the
way for further research in video timeline modeling. The assets are available
via
https://github.com/google-research/google-research/tree/master/video_timeline_modeling.Comment: Accepted as a spotlight by NeurIPS 2023, Track on Datasets and
Benchmark
A Survey on Event-based News Narrative Extraction
Narratives are fundamental to our understanding of the world, providing us
with a natural structure for knowledge representation over time. Computational
narrative extraction is a subfield of artificial intelligence that makes heavy
use of information retrieval and natural language processing techniques.
Despite the importance of computational narrative extraction, relatively little
scholarly work exists on synthesizing previous research and strategizing future
research in the area. In particular, this article focuses on extracting news
narratives from an event-centric perspective. Extracting narratives from news
data has multiple applications in understanding the evolving information
landscape. This survey presents an extensive study of research in the area of
event-based news narrative extraction. In particular, we screened over 900
articles that yielded 54 relevant articles. These articles are synthesized and
organized by representation model, extraction criteria, and evaluation
approaches. Based on the reviewed studies, we identify recent trends, open
challenges, and potential research lines.Comment: 37 pages, 3 figures, to be published in the journal ACM CSU
Proceedings of the First Workshop on Computing News Storylines (CNewsStory 2015)
This volume contains the proceedings of the 1st Workshop on Computing News Storylines (CNewsStory
2015) held in conjunction with the 53rd Annual Meeting of the Association for Computational
Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP
2015) at the China National Convention Center in Beijing, on July 31st 2015.
Narratives are at the heart of information sharing. Ever since people began to share their experiences,
they have connected them to form narratives. The study od storytelling and the field of literary theory
called narratology have developed complex frameworks and models related to various aspects of
narrative such as plots structures, narrative embeddings, characters’ perspectives, reader response, point
of view, narrative voice, narrative goals, and many others. These notions from narratology have been
applied mainly in Artificial Intelligence and to model formal semantic approaches to narratives (e.g.
Plot Units developed by Lehnert (1981)). In recent years, computational narratology has qualified as an
autonomous field of study and research. Narrative has been the focus of a number of workshops and
conferences (AAAI Symposia, Interactive Storytelling Conference (ICIDS), Computational Models of
Narrative). Furthermore, reference annotation schemes for narratives have been proposed (NarrativeML
by Mani (2013)).
The workshop aimed at bringing together researchers from different communities working on
representing and extracting narrative structures in news, a text genre which is highly used in NLP
but which has received little attention with respect to narrative structure, representation and analysis.
Currently, advances in NLP technology have made it feasible to look beyond scenario-driven, atomic
extraction of events from single documents and work towards extracting story structures from multiple
documents, while these documents are published over time as news streams. Policy makers, NGOs,
information specialists (such as journalists and librarians) and others are increasingly in need of tools
that support them in finding salient stories in large amounts of information to more effectively implement
policies, monitor actions of “big players” in the society and check facts. Their tasks often revolve around
reconstructing cases either with respect to specific entities (e.g. person or organizations) or events (e.g.
hurricane Katrina). Storylines represent explanatory schemas that enable us to make better selections
of relevant information but also projections to the future. They form a valuable potential for exploiting
news data in an innovative way.JRC.G.2-Global security and crisis managemen
Recommended from our members
Content Selection for Timeline Generation from Single History Articles
This thesis investigates the problem of content selection for timeline generation from single history articles. While the task of timeline generation has been addressed before, most previous approaches assume the existence of a large corpus of history articles from the same era. They exploit the fact that salient information is likely to be mentioned multiple times in such corpora. However, large resources of this kind are only available for historical events that happened in the most recent decades. In this thesis, I present approaches which can be used to create history timelines for any historical period, even for eras such as the Middle Ages, for which no large corpora of supplementary text exist.
The thesis first presents a system that selects relevant historical figures in a given article, a task which is substantially easier than full timeline generation.
I show that a supervised approach which uses linguistic, structural and semantic features outperforms a competitive baseline on this task.
Based on the observations made in this initial study, I then develop approaches for timeline generation. I find that an unsupervised approach that takes into account the article's subject area outperforms several supervised and unsupervised baselines.
A main focus of this thesis is the development of evaluation methodologies and resources, as no suitable corpora existed when work began.
For the initial experiment on important historical figures, I construct a corpus of existing timelines and textual articles, and devise a method for evaluating algorithms based on this resource.
For timeline generation, I present a comprehensive evaluation methodology which is based on the interpretation of the task as a special form of single-document summarisation. This methodology scores algorithms based on meaning units rather than surface similarity. Unlike previous semantic-units-based evaluation methods for summarisation, my evaluation method does not require any manual annotation of system timelines. Once an evaluation resource has been created, which involves only annotation of the input texts, new timeline generation algorithms can be tested at no cost. This crucial advantage should make my new evaluation methodology attractive for the evaluation of general single-document summaries beyond timelines.
I also present an evaluation resource which is based on this methodology. It was constructed using gold-standard timelines elicited from 30 human timeline writers, and has been made publicly available.
This thesis concentrates on the content selection stage of timeline generation, and leaves the surface realisation step for future work. However, my evaluation methodology is designed in such a way that it can in principle also quantify the degree to which surface realisation is successful
Leveraging Semantic Annotations for Event-focused Search & Summarization
Today in this Big Data era, overwhelming amounts of textual information across different sources with a high degree of redundancy has made it hard for a consumer to retrospect on past events. A plausible solution is to link semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Keeping this larger goal in view, this work uses Wikipedia and online news articles as two prominent yet disparate information sources to address the following three problems: • We address a linking problem to connect Wikipedia excerpts to news articles by casting it into an IR task. Our novel approach integrates time, geolocations, and entities with text to identify relevant documents that can be linked to a given excerpt. • We address an unsupervised extractive multi-document summarization task to generate a fixed-length event digest that facilitates efficient consumption of information contained within a large set of documents. Our novel approach proposes an ILP for global inference across text, time, geolocations, and entities associated with the event. • To estimate temporal focus of short event descriptions, we present a semi-supervised approach that leverages redundancy within a longitudinal news collection to estimate accurate probabilistic time models. Extensive experimental evaluations demonstrate the effectiveness and viability of our proposed approaches towards achieving the larger goal.Im heutigen Big Data Zeitalters existieren überwältigende Mengen an Textinformationen, die über mehrere Quellen verteilt sind und ein hohes Maß an Redundanz haben. Durch diese Gegebenheiten ist eine Retroperspektive auf vergangene Ereignisse für Konsumenten nur schwer möglich. Eine plausible Lösung ist die Verknüpfung semantisch ähnlicher, aber über mehrere Quellen verteilter Informationen, um dadurch eine Struktur zu erzwingen, die mehrere Zugriffspfade auf relevante Informationen, bietet. Vor diesem Hintergrund benutzt diese Dissertation Wikipedia und Onlinenachrichten als zwei prominente, aber dennoch grundverschiedene Informationsquellen, um die folgenden drei Probleme anzusprechen: • Wir adressieren ein Verknüpfungsproblem, um Wikipedia-Auszüge mit Nachrichtenartikeln zu verbinden und das Problem in eine Information-Retrieval-Aufgabe umzuwandeln. Unser neuartiger Ansatz integriert Zeit- und Geobezüge sowie Entitäten mit Text, um relevante Dokumente, die mit einem gegebenen Auszug verknüpft werden können, zu identifizieren. • Wir befassen uns mit einer unüberwachten Extraktionsmethode zur automatischen Zusammenfassung von Texten aus mehreren Dokumenten um Ereigniszusammenfassungen mit fester Länge zu generieren, was eine effiziente Aufnahme von Informationen aus großen Dokumentenmassen ermöglicht. Unser neuartiger Ansatz schlägt eine ganzzahlige lineare Optimierungslösung vor, die globale Inferenzen über Text, Zeit, Geolokationen und mit Ereignis-verbundenen Entitäten zieht. • Um den zeitlichen Fokus kurzer Ereignisbeschreibungen abzuschätzen, stellen wir einen semi-überwachten Ansatz vor, der die Redundanz innerhalb einer langzeitigen Dokumentensammlung ausnutzt, um genaue probabilistische Zeitmodelle abzuschätzen. Umfangreiche experimentelle Auswertungen zeigen die Wirksamkeit und Tragfähigkeit unserer vorgeschlagenen Ansätze zur Erreichung des größeren Ziels
Temporal Information Extraction and Knowledge Base Population
Temporal Information Extraction (TIE) from text plays an important role in many Natural Language Processing and Database applications. Many features of the world are time-dependent, and rich temporal knowledge is required for a more complete and precise understanding of the world. In this thesis we address aspects of two core tasks in TIE. First, we provide a new corpus of labeled temporal relations between events and temporal expressions, dense enough to facilitate a change in research directions from relation classification to identification, and present a system designed to address corresponding new challenges. Second, we implement a novel approach for the discovery and aggregation of temporal information about entity-centric fluent relations
Event identification in social media using classification-clustering framework
In recent years, there has been increased interest in real-world event detection using publicly accessible data made available through Internet technology such as Twitter, Facebook and YouTube. In these highly interactive systems the general public are able to post real-time reactions to “real world" events - thereby acting as social sensors of terrestrial activity. Automatically detecting and categorizing events, particularly smallscale incidents, using streamed data is a non-trivial task, due to the heterogeneity, the scalability and the varied quality of the data as well as the presence of noise and irrelevant information. However, it would be of high value to public safety organisations such as local police, who need to respond accordingly. To address these challenges we present an end-to-end integrated event detection framework which comprises five main components: data collection, pre-processing, classification, online clustering and summarization. The integration between classification and clustering enables events to be detected, especially “disruptive events" - incidents that threaten social safety and security, or that could disrupt social order. We present an evaluation of the effectiveness of detecting events using a variety of features derived from Twitter posts, namely: temporal, spatial and textual content. We evaluate our framework on large-scale, realworld datasets from Twitter and Flickr. Furthermore, we apply our event detection system to a large corpus of tweets posted during the August 2011 riots in England. We show that our system can perform as well as terrestrial sources, such as police reports, traditional surveillance, and emergency calls, even better than local police intelligence in most cases. The framework developed in this thesis provides a scalable, online solution, to handle the high volume of social media documents in different languages including English, Arabic, Eastern languages such as Chinese, and many Latin languages.
Moreover, event detection is a concept that is crucial to the assurance of public safety surrounding real-world events. Decision makers use information from a range of terrestrial and online sources to help inform decisions that enable them to develop policies and react appropriately to events as they unfold. Due to the heterogeneity and scale of the data and the fact that some messages are more salient than others for the purposes of understanding any risk to human safety and managing any disruption caused by events, automatic summarization of event-related microblogs is a non-trivial and important problem. In this thesis we tackle the task of automatic summarization of Twitter posts, and present three methods that produce summaries by selecting the most representative posts from real-world tweet-event clusters. To evaluate our approaches, we compare them to the state-of-the-art summarization systems and human generated summaries. Our results show that our proposed methods outperform all the other summarization systems for English and non-English corpora
The role of volunteered geographic information in land administration systems in developing countries
PhD ThesisDeveloping countries, especially in Africa are faced with a lack of formally registered land.
Available limited records are outdated, inaccurate and unreliable, which makes it a challenge
to properly administer and manage land and its resources. Moreover, limited maintenance
budgets prevalent in these countries make it difficult for organizations to conduct regular
systematic updates of geographic information. Despite these challenges, geographic
information still forms a major component for effective land administration. For a land
administration system (LAS) to remain useful, it must reflect realities on the ground, and this
can only be achieved if land information is reported regularly. However, if changes in land are
not captured in properly administered land registers, LAS lose societal relevance and are
eventually replaced by informal systems. Volunteered Geographic Information (VGI) can
address these LAS challenges by providing timely, affordable, up-to-date, flexible, and fit for
purpose (FFP) land information to support the limited current systems. Nonetheless, the
involvement of volunteers, who in most cases are untrained or non-experts in handling
geographic information, implies that VGI can be of varying quality. Thus, VGI is characterised
by unstructured, heterogeneous, unreliable data which makes data integration for value-added
purposes difficult to effect. These quality challenges can make land authorities reluctant to
incorporate the contributed datasets into their official databases. This research has developed
an innovative approach for establishing the quality and credibility of VGI such that it can be
considered in LAS on an FFP basis. However, verifying volunteer efforts can be difficult
without reference to ground truth, which is prevalent in many developing countries. Therefore,
a novel Trust and Reputation Modelling (TRM) methodology is proposed as a suitable
technique to effect such VGI validation. TRM relies on a view that the public can police
themselves in establishing ‘proxy’ measures of VGI quality and credibility of volunteers, thus
facilitating VGI to be used on an FFP basis in LAS. The output of this research is a conceptual
participatory framework for an FFP land administration based on VGI. The framework outlines
major aspects (social, legal, technical, and institutional) necessary for establishing a
participatory FFP LAS in developing countries.University of Botswan