9,972 research outputs found
Is Aligning Embedding Spaces a Challenging Task? A Study on Heterogeneous Embedding Alignment Methods
Representation Learning of words and Knowledge Graphs (KG) into low
dimensional vector spaces along with its applications to many real-world
scenarios have recently gained momentum. In order to make use of multiple KG
embeddings for knowledge-driven applications such as question answering, named
entity disambiguation, knowledge graph completion, etc., alignment of different
KG embedding spaces is necessary. In addition to multilinguality and
domain-specific information, different KGs pose the problem of structural
differences making the alignment of the KG embeddings more challenging. This
paper provides a theoretical analysis and comparison of the state-of-the-art
alignment methods between two embedding spaces representing entity-entity and
entity-word. This paper also aims at assessing the capability and short-comings
of the existing alignment methods on the pretext of different applications
Jointly Embedding Entities and Text with Distant Supervision
Learning representations for knowledge base entities and concepts is becoming
increasingly important for NLP applications. However, recent entity embedding
methods have relied on structured resources that are expensive to create for
new domains and corpora. We present a distantly-supervised method for jointly
learning embeddings of entities and text from an unnanotated corpus, using only
a list of mappings between entities and surface forms. We learn embeddings from
open-domain and biomedical corpora, and compare against prior methods that rely
on human-annotated text or large knowledge graph structure. Our embeddings
capture entity similarity and relatedness better than prior work, both in
existing biomedical datasets and a new Wikipedia-based dataset that we release
to the community. Results on analogy completion and entity sense disambiguation
indicate that entities and words capture complementary information that can be
effectively combined for downstream use.Comment: 12 pages; Accepted to 3rd Workshop on Representation Learning for NLP
(Repl4NLP 2018). Code at https://github.com/OSU-slatelab/JE
Probabilistic Bag-Of-Hyperlinks Model for Entity Linking
Many fundamental problems in natural language processing rely on determining
what entities appear in a given text. Commonly referenced as entity linking,
this step is a fundamental component of many NLP tasks such as text
understanding, automatic summarization, semantic search or machine translation.
Name ambiguity, word polysemy, context dependencies and a heavy-tailed
distribution of entities contribute to the complexity of this problem.
We here propose a probabilistic approach that makes use of an effective
graphical model to perform collective entity disambiguation. Input mentions
(i.e.,~linkable token spans) are disambiguated jointly across an entire
document by combining a document-level prior of entity co-occurrences with
local information captured from mentions and their surrounding context. The
model is based on simple sufficient statistics extracted from data, thus
relying on few parameters to be learned.
Our method does not require extensive feature engineering, nor an expensive
training procedure. We use loopy belief propagation to perform approximate
inference. The low complexity of our model makes this step sufficiently fast
for real-time usage. We demonstrate the accuracy of our approach on a wide
range of benchmark datasets, showing that it matches, and in many cases
outperforms, existing state-of-the-art methods
- …