1,090 research outputs found
Distantly Labeling Data for Large Scale Cross-Document Coreference
Cross-document coreference, the problem of resolving entity mentions across
multi-document collections, is crucial to automated knowledge base construction
and data mining tasks. However, the scarcity of large labeled data sets has
hindered supervised machine learning research for this task. In this paper we
develop and demonstrate an approach based on ``distantly-labeling'' a data set
from which we can train a discriminative cross-document coreference model. In
particular we build a dataset of more than a million people mentions extracted
from 3.5 years of New York Times articles, leverage Wikipedia for distant
labeling with a generative model (and measure the reliability of such
labeling); then we train and evaluate a conditional random field coreference
model that has factors on cross-document entities as well as mention-pairs.
This coreference model obtains high accuracy in resolving mentions and entities
that are not present in the training data, indicating applicability to
non-Wikipedia data. Given the large amount of data, our work is also an
exercise demonstrating the scalability of our approach.Comment: 16 pages, submitted to ECML 201
Learning New Facts From Knowledge Bases With Neural Tensor Networks and Semantic Word Vectors
Knowledge bases provide applications with the benefit of easily accessible,
systematic relational knowledge but often suffer in practice from their
incompleteness and lack of knowledge of new entities and relations. Much work
has focused on building or extending them by finding patterns in large
unannotated text corpora. In contrast, here we mainly aim to complete a
knowledge base by predicting additional true relationships between entities,
based on generalizations that can be discerned in the given knowledgebase. We
introduce a neural tensor network (NTN) model which predicts new relationship
entries that can be added to the database. This model can be improved by
initializing entity representations with word vectors learned in an
unsupervised fashion from text, and when doing this, existing relations can
even be queried for entities that were not present in the database. Our model
generalizes and outperforms existing models for this problem, and can classify
unseen relationships in WordNet with an accuracy of 75.8%
A constraint-based approach to noun phrase coreference resolution in German newspaper text
In this paper, we investigate the usefulness of a wide range of features for their usefulness in the resolution of nominal coreference, both as hard constraints (i.e. completely removing elements from the list of possible candidates) as well as soft constraints (where a cumulation of violations of soft constraints will make it less likely that a candidate is chosen as the antecedent). We present a state of the art system based on such constraints and weights estimated with a maximum entropy model, using lexical information to resolve cases of coreferent bridging
- …