2,303 research outputs found
Identity and Granularity of Events in Text
In this paper we describe a method to detect event descrip- tions in
different news articles and to model the semantics of events and their
components using RDF representations. We compare these descriptions to solve a
cross-document event coreference task. Our com- ponent approach to event
semantics defines identity and granularity of events at different levels. It
performs close to state-of-the-art approaches on the cross-document event
coreference task, while outperforming other works when assuming similar quality
of event detection. We demonstrate how granularity and identity are
interconnected and we discuss how se- mantic anomaly could be used to define
differences between coreference, subevent and topical relations.Comment: Invited keynote speech by Piek Vossen at Cicling 201
Use Generalized Representations, But Do Not Forget Surface Features
Only a year ago, all state-of-the-art coreference resolvers were using an
extensive amount of surface features. Recently, there was a paradigm shift
towards using word embeddings and deep neural networks, where the use of
surface features is very limited. In this paper, we show that a simple SVM
model with surface features outperforms more complex neural models for
detecting anaphoric mentions. Our analysis suggests that using generalized
representations and surface features have different strength that should be
both taken into account for improving coreference resolution.Comment: CORBON workshop@EACL 201
Distantly Labeling Data for Large Scale Cross-Document Coreference
Cross-document coreference, the problem of resolving entity mentions across
multi-document collections, is crucial to automated knowledge base construction
and data mining tasks. However, the scarcity of large labeled data sets has
hindered supervised machine learning research for this task. In this paper we
develop and demonstrate an approach based on ``distantly-labeling'' a data set
from which we can train a discriminative cross-document coreference model. In
particular we build a dataset of more than a million people mentions extracted
from 3.5 years of New York Times articles, leverage Wikipedia for distant
labeling with a generative model (and measure the reliability of such
labeling); then we train and evaluate a conditional random field coreference
model that has factors on cross-document entities as well as mention-pairs.
This coreference model obtains high accuracy in resolving mentions and entities
that are not present in the training data, indicating applicability to
non-Wikipedia data. Given the large amount of data, our work is also an
exercise demonstrating the scalability of our approach.Comment: 16 pages, submitted to ECML 201
- …