682 research outputs found
Supervised Typing of Big Graphs using Semantic Embeddings
We propose a supervised algorithm for generating type embeddings in the same
semantic vector space as a given set of entity embeddings. The algorithm is
agnostic to the derivation of the underlying entity embeddings. It does not
require any manual feature engineering, generalizes well to hundreds of types
and achieves near-linear scaling on Big Graphs containing many millions of
triples and instances by virtue of an incremental execution. We demonstrate the
utility of the embeddings on a type recommendation task, outperforming a
non-parametric feature-agnostic baseline while achieving 15x speedup and
near-constant memory usage on a full partition of DBpedia. Using
state-of-the-art visualization, we illustrate the agreement of our
extensionally derived DBpedia type embeddings with the manually curated domain
ontology. Finally, we use the embeddings to probabilistically cluster about 4
million DBpedia instances into 415 types in the DBpedia ontology.Comment: 6 pages, to be published in Semantic Big Data Workshop at ACM, SIGMOD
2017; extended version in preparation for Open Journal of Semantic Web (OJSW
Scalable Generation of Type Embeddings Using the ABox
Structured knowledge bases gain their expressive power from both the ABox and TBox. While the ABox is rich in data, the TBox contains the ontological assertions that are often necessary for logical inference. The crucial links between the ABox and the TBox are served by is-a statements (formally a part of the ABox) that connect instances to types, also referred to as classes or concepts. Latent space embedding algorithms, such as RDF2Vec and TransE, have been used to great effect to model instances in the ABox. Such algorithms work well on large-scale knowledge bases like DBpedia and Geonames, as they are robust to noise and are low-dimensional and real-valued. In this paper, we investigate a supervised algorithm for deriving type embeddings in the same latent space as a given set of entity embeddings. We show that our algorithm generalizes to hundreds of types, and via incremental execution, achieves near-linear scaling on graphs with millions of instances and facts. We also present a theoretical foundation for our proposed model, and the means of validating the model. The empirical utility of the embeddings is illustrated on five partitions of the English DBpedia ABox. We use visualization and clustering to show that our embeddings are in good agreement with the manually curated TBox. We also use the embeddings to perform a soft clustering on 4 million DBpedia instances in terms of the 415 types explicitly participating in is-a relationships in the DBpedia ABox. Lastly, we present a set of results obtained by using the embeddings to recommend types for untyped instances. Our method is shown to outperform another feature-agnostic baseline while achieving 15x speedup without any growth in memory usage
Leveraging Wikidata's edit history in knowledge graph refinement tasks
Knowledge graphs have been adopted in many diverse fields for a variety of
purposes. Most of those applications rely on valid and complete data to deliver
their results, pressing the need to improve the quality of knowledge graphs. A
number of solutions have been proposed to that end, ranging from rule-based
approaches to the use of probabilistic methods, but there is an element that
has not been considered yet: the edit history of the graph. In the case of
collaborative knowledge graphs (e.g., Wikidata), those edits represent the
process in which the community reaches some kind of fuzzy and distributed
consensus over the information that best represents each entity, and can hold
potentially interesting information to be used by knowledge graph refinement
methods. In this paper, we explore the use of edit history information from
Wikidata to improve the performance of type prediction methods. To do that, we
have first built a JSON dataset containing the edit history of every instance
from the 100 most important classes in Wikidata. This edit history information
is then explored and analyzed, with a focus on its potential applicability in
knowledge graph refinement tasks. Finally, we propose and evaluate two new
methods to leverage this edit history information in knowledge graph embedding
models for type prediction tasks. Our results show an improvement in one of the
proposed methods against current approaches, showing the potential of using
edit information in knowledge graph refinement tasks and opening new promising
research lines within the field.Comment: 18 pages, 7 figures. Submitted to the Journal of Web Semantic
- …