10 research outputs found
Identity and Granularity of Events in Text
In this paper we describe a method to detect event descrip- tions in
different news articles and to model the semantics of events and their
components using RDF representations. We compare these descriptions to solve a
cross-document event coreference task. Our com- ponent approach to event
semantics defines identity and granularity of events at different levels. It
performs close to state-of-the-art approaches on the cross-document event
coreference task, while outperforming other works when assuming similar quality
of event detection. We demonstrate how granularity and identity are
interconnected and we discuss how se- mantic anomaly could be used to define
differences between coreference, subevent and topical relations.Comment: Invited keynote speech by Piek Vossen at Cicling 201
Event Coreference Resolution by Iteratively Unfolding Inter-dependencies among Events
We introduce a novel iterative approach for event coreference resolution that
gradually builds event clusters by exploiting inter-dependencies among event
mentions within the same chain as well as across event chains. Among event
mentions in the same chain, we distinguish within- and cross-document event
coreference links by using two distinct pairwise classifiers, trained
separately to capture differences in feature distributions of within- and
cross-document event clusters. Our event coreference approach alternates
between WD and CD clustering and combines arguments from both event clusters
after every merge, continuing till no more merge can be made. And then it
performs further merging between event chains that are both closely related to
a set of other chains of events. Experiments on the ECB+ corpus show that our
model outperforms state-of-the-art methods in joint task of WD and CD event
coreference resolution.Comment: EMNLP 201
Event-based Access to Historical Italian War Memoirs
The progressive digitization of historical archives provides new, often
domain specific, textual resources that report on facts and events which have
happened in the past; among these, memoirs are a very common type of primary
source. In this paper, we present an approach for extracting information from
Italian historical war memoirs and turning it into structured knowledge. This
is based on the semantic notions of events, participants and roles. We evaluate
quantitatively each of the key-steps of our approach and provide a graph-based
representation of the extracted knowledge, which allows to move between a Close
and a Distant Reading of the collection.Comment: 23 pages, 6 figure
Generalizing Cross-Document Event Coreference Resolution Across Multiple Corpora
Cross-document event coreference resolution (CDCR) is an NLP task in which
mentions of events need to be identified and clustered throughout a collection
of documents. CDCR aims to benefit downstream multi-document applications, but
despite recent progress on corpora and system development, downstream
improvements from applying CDCR have not been shown yet. We make the
observation that every CDCR system to date was developed, trained, and tested
only on a single respective corpus. This raises strong concerns on their
generalizability -- a must-have for downstream applications where the magnitude
of domains or event mentions is likely to exceed those found in a curated
corpus. To investigate this assumption, we define a uniform evaluation setup
involving three CDCR corpora: ECB+, the Gun Violence Corpus and the Football
Coreference Corpus (which we reannotate on token level to make our analysis
possible). We compare a corpus-independent, feature-based system against a
recent neural system developed for ECB+. Whilst being inferior in absolute
numbers, the feature-based system shows more consistent performance across all
corpora whereas the neural system is hit-and-miss. Via model introspection, we
find that the importance of event actions, event time, etc. for resolving
coreference in practice varies greatly between the corpora. Additional analysis
shows that several systems overfit on the structure of the ECB+ corpus. We
conclude with recommendations on how to achieve generally applicable CDCR
systems in the future -- the most important being that evaluation on multiple
CDCR corpora is strongly necessary. To facilitate future research, we release
our dataset, annotation guidelines, and system implementation to the public.Comment: Accepted at CL Journa
NewsReader: Using knowledge resources in a cross-lingual reading machine to generate more knowledge from massive streams of news
Abstract In this article, we describe a system that reads news articles in four different languages and detects what happened, who is involved, where and when. This event-centric information is represented as episodic situational knowledge on individuals in an interoperable RDF format that allows for reasoning on the implications of the events. Our system covers the complete path from unstructured text to structured knowledge, for which we defined a formal model that links interpreted textual mentions of things to their representation as instances. The model forms the skeleton for interoperable interpretation across different sources and languages. The real content, however, is defined using multilingual and cross-lingual knowledge resources, both semantic and episodic. We explain how these knowledge resources are used for the processing of text and ultimately define the actual content of the episodic situational knowledge that is reported in the news. The knowledge and model in our system can be seen as an example how the Semantic Web helps NLP. However, our systems also generate massive episodic knowledge of the same type as the Semantic Web is built on. We thus envision a cycle of knowledge acquisition and NLP improvement on a massive scale. This article reports on the details of the system but also on the performance of various high-level components. We demonstrate that our system performs at state-of-the-art level for various subtasks in the four languages of the project, but that we also consider the full integration of these tasks in an overall system with the purpose of reading text. We applied our system to millions of news articles, generating billions of triples expressing formal semantic properties. This shows the capacity of the system to perform at an unprecedented scale