Search CORE

13,237 research outputs found

Detection of relation assertion errors in knowledge graphs

Author: Bordes Antoine
Bordes Antoine
Debattista Jeremy
Lao Ni
Nickel Maximilian
Ponzetto Heiko Paulheim Michael Simone Paolo
Socher Richard
Wang Q.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

Crossref

MAnnheim DOCument Server

Correcting Knowledge Base Assertions

Author: Arndt Dörthe
Auer Sören
Chen Jiaoyan
De Melo Gerard
Dimou Anastasia
Lertvittayakumjorn Piyawat
Melo André
Niklaus Christina
Omran Pouya Ghiasnezhad
Trouillon Théo
Vrandečić Denny
Zhang Wen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

The usefulness and usability of knowledge bases (KBs) is often limited by quality issues. One common issue is the presence of erroneous assertions, often caused by lexical or semantic confusion. We study the problem of correcting such assertions, and present a general correction framework which combines lexical matching, semantic embedding, soft constraint mining and semantic consistency checking. The framework is evaluated using DBpedia and an enterprise medical KB

arXiv.org e-Print Archive

City Research Online

Crossref

NIVA Open Access Archive

NORA - Norwegian Open Research Archives

Automatic refinement of large-scale cross-domain knowledge graphs

Author: Melo André
Publication venue
Publication date: 01/01/2018
Field of study

Knowledge graphs are a way to represent complex structured and unstructured information integrated into an ontology, with which one can reason about the existing information to deduce new information or highlight inconsistencies. Knowledge graphs are divided into the terminology box (TBox), also known as ontology, and the assertions box (ABox). The former consists of a set of schema axioms defining classes and properties which describe the data domain. Whereas the ABox consists of a set of facts describing instances in terms of the TBox vocabulary. In the recent years, there have been several initiatives for creating large-scale cross-domain knowledge graphs, both free and commercial, with DBpedia, YAGO, and Wikidata being amongst the most successful free datasets. Those graphs are often constructed with the extraction of information from semi-structured knowledge, such as Wikipedia, or unstructured text from the web using NLP methods. It is unlikely, in particular when heuristic methods are applied and unreliable sources are used, that the knowledge graph is fully correct or complete. There is a tradeoff between completeness and correctness, which is addressed differently in each knowledge graph’s construction approach. There is a wide variety of applications for knowledge graphs, e.g. semantic search and discovery, question answering, recommender systems, expert systems and personal assistants. The quality of a knowledge graph is crucial for its applications. In order to further increase the quality of such large-scale knowledge graphs, various automatic refinement methods have been proposed. Those methods try to infer and add missing knowledge to the graph, or detect erroneous pieces of information. In this thesis, we investigate the problem of automatic knowledge graph refinement and propose methods that address the problem from two directions, automatic refinement of the TBox and of the ABox. In Part I we address the ABox refinement problem. We propose a method for predicting missing type assertions using hierarchical multilabel classifiers and ingoing/ outgoing links as features. We also present an approach to detection of relation assertion errors which exploits type and path patterns in the graph. Moreover, we propose an approach to correction of relation errors originating from confusions between entities. Also in the ABox refinement direction, we propose a knowledge graph model and process for synthesizing knowledge graphs for benchmarking ABox completion methods. In Part II we address the TBox refinement problem. We propose methods for inducing flexible relation constraints from the ABox, which are expressed using SHACL.We introduce an ILP refinement step which exploits correlations between numerical attributes and relations in order to the efficiently learn Horn rules with numerical attributes. Finally, we investigate the introduction of lexical information from textual corpora into the ILP algorithm in order to improve quality of induced class expressions

MAnnheim DOCument Server

Going Deeper with Semantics: Video Activity Interpretation using Semantic Contextualization

Author: Aakur Sathyanarayanan N.
de Souza Fillipe DM
Sarkar Sudeep
Publication venue
Publication date: 15/11/2018
Field of study

A deeper understanding of video activities extends beyond recognition of underlying concepts such as actions and objects: constructing deep semantic representations requires reasoning about the semantic relationships among these concepts, often beyond what is directly observed in the data. To this end, we propose an energy minimization framework that leverages large-scale commonsense knowledge bases, such as ConceptNet, to provide contextual cues to establish semantic relationships among entities directly hypothesized from video signal. We mathematically express this using the language of Grenander's canonical pattern generator theory. We show that the use of prior encoded commonsense knowledge alleviate the need for large annotated training datasets and help tackle imbalance in training through prior knowledge. Using three different publicly available datasets - Charades, Microsoft Visual Description Corpus and Breakfast Actions datasets, we show that the proposed model can generate video interpretations whose quality is better than those reported by state-of-the-art approaches, which have substantial training needs. Through extensive experiments, we show that the use of commonsense knowledge from ConceptNet allows the proposed approach to handle various challenges such as training data imbalance, weak features, and complex semantic relationships and visual scenes.Comment: Accepted to WACV 201

arXiv.org e-Print Archive

Crossref

An approach to correction of erroneous links in knowledge graphs

Author: Melo André
Paulheim Heiko
Publication venue: RWTH
Publication date: 01/01/2017
Field of study

MAnnheim DOCument Server

Regression-free Synthesis for Concurrency

Author: Henzinger Thomas A.
Radhakrishna Arjun
Ryzhyk Leonid
Tarrach Thorsten
Černý Pavol
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

While fixing concurrency bugs, program repair algorithms may introduce new concurrency bugs. We present an algorithm that avoids such regressions. The solution space is given by a set of program transformations we consider in for repair process. These include reordering of instructions within a thread and inserting atomic sections. The new algorithm learns a constraint on the space of candidate solutions, from both positive examples (error-free traces) and counterexamples (error traces). From each counterexample, the algorithm learns a constraint necessary to remove the errors. From each positive examples, it learns a constraint that is necessary in order to prevent the repair from turning the trace into an error trace. We implemented the algorithm and evaluated it on simplified Linux device drivers with known bugs.Comment: for source code see https://github.com/thorstent/ConRepai

arXiv.org e-Print Archive

IST PubRep

IST Austria: PubRep (Institute of Science and Technology)

What is Normal, What is Strange, and What is Missing in a Knowledge Graph: Unified Characterization via Inductive Summarization

Author: Bordes Antoine
Carlson Andrew
Goebl S.
Nickel Maximilian
Rissanen J.
Tanon Thomas Pellissier
Trouillon Théo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Knowledge graphs (KGs) store highly heterogeneous information about the world in the structure of a graph, and are useful for tasks such as question answering and reasoning. However, they often contain errors and are missing information. Vibrant research in KG refinement has worked to resolve these issues, tailoring techniques to either detect specific types of errors or complete a KG. In this work, we introduce a unified solution to KG characterization by formulating the problem as unsupervised KG summarization with a set of inductive, soft rules, which describe what is normal in a KG, and thus can be used to identify what is abnormal, whether it be strange or missing. Unlike first-order logic rules, our rules are labeled, rooted graphs, i.e., patterns that describe the expected neighborhood around a (seen or unseen) node, based on its type, and information in the KG. Stepping away from the traditional support/confidence-based rule mining techniques, we propose KGist, Knowledge Graph Inductive SummarizaTion, which learns a summary of inductive rules that best compress the KG according to the Minimum Description Length principle---a formulation that we are the first to use in the context of KG rule mining. We apply our rules to three large KGs (NELL, DBpedia, and Yago), and tasks such as compression, various types of error detection, and identification of incomplete information. We show that KGist outperforms task-specific, supervised and unsupervised baselines in error detection and incompleteness identification, (identifying the location of up to 93% of missing entities---over 10% more than baselines), while also being efficient for large knowledge graphs.Comment: 10 pages, plus 2 pages of references. 5 figures. Accepted at The Web Conference 202

arXiv.org e-Print Archive

Crossref

MPG.PuRe

The IBMAP approach for Markov networks structure learning

Author: Alejandro Edera
C Aliferis
C Aliferis
D Koller
D Margaritis
DM Chickering
F Bromberg
F Bromberg
Facundo Bromberg
Federico Schlüter
J Pearl
M Mitchell
MJ Wainwright
P Larraṅaga
P Ravikumar
P Spirtes
R Santana
S Della Pietra
S Shakya
SL Lauritzen
TM Cover
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/02/2014
Field of study

In this work we consider the problem of learning the structure of Markov networks from data. We present an approach for tackling this problem called IBMAP, together with an efficient instantiation of the approach: the IBMAP-HC algorithm, designed for avoiding important limitations of existing independence-based algorithms. These algorithms proceed by performing statistical independence tests on data, trusting completely the outcome of each test. In practice tests may be incorrect, resulting in potential cascading errors and the consequent reduction in the quality of the structures learned. IBMAP contemplates this uncertainty in the outcome of the tests through a probabilistic maximum-a-posteriori approach. The approach is instantiated in the IBMAP-HC algorithm, a structure selection strategy that performs a polynomial heuristic local search in the space of possible structures. We present an extensive empirical evaluation on synthetic and real data, showing that our algorithm outperforms significantly the current independence-based algorithms, in terms of data efficiency and quality of learned structures, with equivalent computational complexities. We also show the performance of IBMAP-HC in a real-world application of knowledge discovery: EDAs, which are evolutionary algorithms that use structure learning on each generation for modeling the distribution of populations. The experiments show that when IBMAP-HC is used to learn the structure, EDAs improve the convergence to the optimum

arXiv.org e-Print Archive

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

CONICET Digital