8 research outputs found

    Generating Rules to Filter Candidate Triples for their Correctness Checking by Knowledge Graph Completion Techniques

    Get PDF
    Knowledge Graphs (KGs) contain large amounts of structured information. Due to their inherent incompleteness, a process known as KG completion is often carried out to find the missing triples in a KG, usually by training a fact checking model that is able to discern between correct and incorrect knowledge. After the fact checking model has been trained and evaluated, it has to be applied to a set of candidate triples, and those that are considered correct are added to the KG as new knowledge. However, this process needs a set of candidate triples of a reasonable size that represents possible new knowledge, in order to be evaluated by the fact checking task and, if considered to be correct, added to the KG, enriching it. Current approaches for selecting candidate triples for their correctness checking either use the full set possible missing candidate triples (and thus provide no filtering) or apply very basic rules to filter out unlikely candidates, which may have a negative effect on the completion performance as very few candidate triples are filtered out. In this paper we present CHAI, a method for producing more complex rules that are able to filter candidate triples by combining a set of criteria to optimize a fitness function. Our experiments show that CHAI is able to generate rules that, when applied, yield smaller candidate sets than similar proposals while still including promising candidate triples.Ministerio de Economía y Competitividad TIN2016-75394-

    An Automatic Ontology Generation Framework with An Organizational Perspective

    Get PDF
    Ontologies have been known for their powerful semantic representation of knowledge. However, ontologies cannot automatically evolve to reflect updates that occur in respective domains. To address this limitation, researchers have called for automatic ontology generation from unstructured text corpus. Unfortunately, systems that aim to generate ontologies from unstructured text corpus are domain-specific and require manual intervention. In addition, they suffer from uncertainty in creating concept linkages and difficulty in finding axioms for the same concept. Knowledge Graphs (KGs) has emerged as a powerful model for the dynamic representation of knowledge. However, KGs have many quality limitations and need extensive refinement. This research aims to develop a novel domain-independent automatic ontology generation framework that converts unstructured text corpus into domain consistent ontological form. The framework generates KGs from unstructured text corpus as well as refine and correct them to be consistent with domain ontologies. The power of the proposed automatically generated ontology is that it integrates the dynamic features of KGs and the quality features of ontologies

    Calibrating Knowledge Graphs

    Get PDF
    A knowledge graph model represents a given knowledge graph as a number of vectors. These models are evaluated for several tasks, and one of them is link prediction, which consists of predicting whether new edges are plausible when the model is provided with a partial edge. Calibration is a postprocessing technique that aims to align the predictions of a model with respect to a ground truth. The idea is to make a model more reliable by reducing its confidence for incorrect predictions (overconfidence), and increasing the confidence for correct predictions that are closer to the negative threshold (underconfidence). Calibration for knowledge graph models have been previously studied for the task of triple classification, which is different than link prediction, and assuming closed-world, that is, knowledge that is missing from the graph at hand is incorrect. However, knowledge graphs operate under the open-world assumption such that it is unknown whether missing knowledge is correct or incorrect. In this thesis, we propose open-world calibration of knowledge graph models for link prediction. We rely on strategies to synthetically generate negatives that are expected to have different levels of semantic plausibility. Calibration thus consists of aligning the predictions of the model with these different semantic levels. Nonsensical negatives should be farther away from a positive than semantically plausible negatives. We analyze several scenarios in which calibration based on the sigmoid function can lead to incorrect results when considering distance-based models. We also propose the Jensen-Shannon distance to measure the divergence of the predictions before and after calibration. Our experiments exploit several pre-trained models of nine algorithms over seven datasets. Our results show that many of these pre-trained models are properly calibrated without intervention under the closed-world assumption, but it is not the case for the open-world assumption. Furthermore, Brier scores (the mean squared error before and after calibration) using the closed-world assumption are generally lower and the divergence is higher when using open-world calibration. From these results, we gather that open-world calibration is a harder task than closed-world calibration. Finally, analyzing different measurements related to link prediction accuracy, we propose a combined loss function for calibration that maintains the accuracy of the model

    A Combinational Method to Determining Identical Entities from Heterogeneous Knowledge Graphs

    Get PDF
    With the increasing demand for intelligent services, knowledge graph technologies have attracted much attention. Various application-specific knowledge bases have been developed in industry and academia. In particular, open knowledge bases play an important role for constructing a new knowledge base by serving as a reference data source. However, identifying the same entities among heterogeneous knowledge sources is not trivial. This study focuses on extracting and determining exact and precise entities, which is essential for merging and fusing various knowledge sources. To achieve this, several algorithms for extracting the same entities are proposed and then their performance is evaluated using real-world knowledge sources

    End-to-End Entity Resolution for Big Data: A Survey

    Get PDF
    One of the most important tasks for improving data quality and the reliability of data analytics results is Entity Resolution (ER). ER aims to identify different descriptions that refer to the same real-world entity, and remains a challenging problem. While previous works have studied specific aspects of ER (and mostly in traditional settings), in this survey, we provide for the first time an end-to-end view of modern ER workflows, and of the novel aspects of entity indexing and matching methods in order to cope with more than one of the Big Data characteristics simultaneously. We present the basic concepts, processing steps and execution strategies that have been proposed by different communities, i.e., database, semantic Web and machine learning, in order to cope with the loose structuredness, extreme diversity, high speed and large scale of entity descriptions used by real-world applications. Finally, we provide a synthetic discussion of the existing approaches, and conclude with a detailed presentation of open research directions