SISTEDES: Ingeniería de Software y las Tecnologías de Desarrollo de Software
Abstract
In the context of knowledge graphs, the task of completion of
relations consists in adding missing triples to a knowledge graph, usually
by classifying potential candidates as true of false. Creating an evalu-
ation dataset for these techniques is not trivial, since there is a large
amount of variables to consider which, if not taken into account, may
cause misleading results. So far, there is not a well de ned work ow that
identi es the variation points when creating a dataset, and what are the
possible strategies that can be followed in each step. Furthermore, there
are no tools that help create such datasets in an easy way. To address
this need, we have created AYNEC-DataGen, a customisable tool for the
generation of datasets with multiple variation points related to the pre-
processing of the original knowledge graph, the splitting of triples into
training and testing sets, and the generation of negative examples. The
output of our tool includes the evaluation dataset, an optional export in
an open format for its visualisation, and additional files with metadata.
Our tool is freely available online.Ministerio de Economía y Competitividad TIN2016-75394-