1,646 research outputs found
Learning Dictionaries for Named Entity Recognition using Minimal Supervision
This paper describes an approach for automatic construction of dictionaries
for Named Entity Recognition (NER) using large amounts of unlabeled data and a
few seed examples. We use Canonical Correlation Analysis (CCA) to obtain lower
dimensional embeddings (representations) for candidate phrases and classify
these phrases using a small number of labeled examples. Our method achieves
16.5% and 11.3% F-1 score improvement over co-training on disease and virus NER
respectively. We also show that by adding candidate phrase embeddings as
features in a sequence tagger gives better performance compared to using word
embeddings.Comment: In 14th Conference of the European Chapter of the Association for
Computational Linguistic, 201
Inferring Missing Entity Type Instances for Knowledge Base Completion: New Dataset and Methods
Most of previous work in knowledge base (KB) completion has focused on the
problem of relation extraction. In this work, we focus on the task of inferring
missing entity type instances in a KB, a fundamental task for KB competition
yet receives little attention. Due to the novelty of this task, we construct a
large-scale dataset and design an automatic evaluation methodology. Our
knowledge base completion method uses information within the existing KB and
external information from Wikipedia. We show that individual methods trained
with a global objective that considers unobserved cells from both the entity
and the type side gives consistently higher quality predictions compared to
baseline methods. We also perform manual evaluation on a small subset of the
data to verify the effectiveness of our knowledge base completion methods and
the correctness of our proposed automatic evaluation method.Comment: North American Chapter of the Association for Computational
Linguistics- Human Language Technologies, 201
Compositional Vector Space Models for Knowledge Base Completion
Knowledge base (KB) completion adds new facts to a KB by making inferences
from existing facts, for example by inferring with high likelihood
nationality(X,Y) from bornIn(X,Y). Most previous methods infer simple one-hop
relational synonyms like this, or use as evidence a multi-hop relational path
treated as an atomic feature, like bornIn(X,Z) -> containedIn(Z,Y). This paper
presents an approach that reasons about conjunctions of multi-hop relations
non-atomically, composing the implications of a path using a recursive neural
network (RNN) that takes as inputs vector embeddings of the binary relation in
the path. Not only does this allow us to generalize to paths unseen at training
time, but also, with a single high-capacity RNN, to predict new relation types
not seen when the compositional model was trained (zero-shot learning). We
assemble a new dataset of over 52M relational triples, and show that our method
improves over a traditional classifier by 11%, and a method leveraging
pre-trained embeddings by 7%.Comment: The 53rd Annual Meeting of the Association for Computational
Linguistics and The 7th International Joint Conference of the Asian
Federation of Natural Language Processing, 201
рдордЫреНрд▓реА рдкрдХрдб рдкреНрд░рдмрдВрдзрди рд╡рд┐рдЪрд╛рд░ рдзрд╛рд░рд╛рдПрдВ рдФрд░ рдирдП рд╕рдореАрдкрди
рдХреГрдкрдпрд╛ рдкреВрд░рд╛ рд▓реЗрдЦрд╛ рдкрдв
- тАж