78 research outputs found

    Correcting Knowledge Base Assertions

    Get PDF
    The usefulness and usability of knowledge bases (KBs) is often limited by quality issues. One common issue is the presence of erroneous assertions, often caused by lexical or semantic confusion. We study the problem of correcting such assertions, and present a general correction framework which combines lexical matching, semantic embedding, soft constraint mining and semantic consistency checking. The framework is evaluated using DBpedia and an enterprise medical KB

    Engineering Agile Big-Data Systems

    Get PDF
    To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems

    Engineering Agile Big-Data Systems

    Get PDF
    To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems

    Fusing Automatically Extracted Annotations for the Semantic Web

    Get PDF
    This research focuses on the problem of semantic data fusion. Although various solutions have been developed in the research communities focusing on databases and formal logic, the choice of an appropriate algorithm is non-trivial because the performance of each algorithm and its optimal configuration parameters depend on the type of data, to which the algorithm is applied. In order to be reusable, the fusion system must be able to select appropriate techniques and use them in combination. Moreover, because of the varying reliability of data sources and algorithms performing fusion subtasks, uncertainty is an inherent feature of semantically annotated data and has to be taken into account by the fusion system. Finally, the issue of schema heterogeneity can have a negative impact on the fusion performance. To address these issues, we propose KnoFuss: an architecture for Semantic Web data integration based on the principles of problem-solving methods. Algorithms dealing with different fusion subtasks are represented as components of a modular architecture, and their capabilities are described formally. This allows the architecture to select appropriate methods and configure them depending on the processed data. In order to handle uncertainty, we propose a novel algorithm based on the Dempster-Shafer belief propagation. KnoFuss employs this algorithm to reason about uncertain data and method results in order to refine the fused knowledge base. Tests show that these solutions lead to improved fusion performance. Finally, we addressed the problem of data fusion in the presence of schema heterogeneity. We extended the KnoFuss framework to exploit results of automatic schema alignment tools and proposed our own schema matching algorithm aimed at facilitating data fusion in the Linked Data environment. We conducted experiments with this approach and obtained a substantial improvement in performance in comparison with public data repositories

    Automatic refinement of large-scale cross-domain knowledge graphs

    Get PDF
    Knowledge graphs are a way to represent complex structured and unstructured information integrated into an ontology, with which one can reason about the existing information to deduce new information or highlight inconsistencies. Knowledge graphs are divided into the terminology box (TBox), also known as ontology, and the assertions box (ABox). The former consists of a set of schema axioms defining classes and properties which describe the data domain. Whereas the ABox consists of a set of facts describing instances in terms of the TBox vocabulary. In the recent years, there have been several initiatives for creating large-scale cross-domain knowledge graphs, both free and commercial, with DBpedia, YAGO, and Wikidata being amongst the most successful free datasets. Those graphs are often constructed with the extraction of information from semi-structured knowledge, such as Wikipedia, or unstructured text from the web using NLP methods. It is unlikely, in particular when heuristic methods are applied and unreliable sources are used, that the knowledge graph is fully correct or complete. There is a tradeoff between completeness and correctness, which is addressed differently in each knowledge graph’s construction approach. There is a wide variety of applications for knowledge graphs, e.g. semantic search and discovery, question answering, recommender systems, expert systems and personal assistants. The quality of a knowledge graph is crucial for its applications. In order to further increase the quality of such large-scale knowledge graphs, various automatic refinement methods have been proposed. Those methods try to infer and add missing knowledge to the graph, or detect erroneous pieces of information. In this thesis, we investigate the problem of automatic knowledge graph refinement and propose methods that address the problem from two directions, automatic refinement of the TBox and of the ABox. In Part I we address the ABox refinement problem. We propose a method for predicting missing type assertions using hierarchical multilabel classifiers and ingoing/ outgoing links as features. We also present an approach to detection of relation assertion errors which exploits type and path patterns in the graph. Moreover, we propose an approach to correction of relation errors originating from confusions between entities. Also in the ABox refinement direction, we propose a knowledge graph model and process for synthesizing knowledge graphs for benchmarking ABox completion methods. In Part II we address the TBox refinement problem. We propose methods for inducing flexible relation constraints from the ABox, which are expressed using SHACL.We introduce an ILP refinement step which exploits correlations between numerical attributes and relations in order to the efficiently learn Horn rules with numerical attributes. Finally, we investigate the introduction of lexical information from textual corpora into the ILP algorithm in order to improve quality of induced class expressions

    Knowledge base ontological debugging guided by linguistic evidence

    Get PDF
    Le résumé en français n'a pas été communiqué par l'auteur.When they grow in size, knowledge bases (KBs) tend to include sets of axioms which are intuitively absurd but nonetheless logically consistent. This is particularly true of data expressed in OWL, as part of the Semantic Web framework, which favors the aggregation of set of statements from multiple sources of knowledge, with overlapping signatures.Identifying nonsense is essential if one wants to avoid undesired inferences, but the sparse usage of negation within these datasets generally prevents the detection of such cases on a strict logical basis. And even if the KB is inconsistent, identifying the axioms responsible for the nonsense remains a non trivial task. This thesis investigates the usage of automatically gathered linguistic evidence in order to detect and repair violations of common sense within such datasets. The main intuition consists in exploiting distributional similarity between named individuals of an input KB, in order to identify consequences which are unlikely to hold if the rest of the KB does. Then the repair phase consists in selecting axioms to be preferably discarded (or at least amended) in order to get rid of the nonsense. A second strategy is also presented, which consists in strengthening the input KB with a foundational ontology, in order to obtain an inconsistency, before performing a form of knowledge base debugging/revision which incorporates this linguistic input. This last step may also be applied directly to an inconsistent input KB. These propositions are evaluated with different sets of statements issued from the Linked Open Data cloud, as well as datasets of a higher quality, but which were automatically degraded for the evaluation. The results seem to indicate that distributional evidence may actually constitute a relevant common ground for deciding between conflicting axioms

    Knowledge base ontological debugging guided by linguistic evidence

    Get PDF
    Le résumé en français n'a pas été communiqué par l'auteur.When they grow in size, knowledge bases (KBs) tend to include sets of axioms which are intuitively absurd but nonetheless logically consistent. This is particularly true of data expressed in OWL, as part of the Semantic Web framework, which favors the aggregation of set of statements from multiple sources of knowledge, with overlapping signatures.Identifying nonsense is essential if one wants to avoid undesired inferences, but the sparse usage of negation within these datasets generally prevents the detection of such cases on a strict logical basis. And even if the KB is inconsistent, identifying the axioms responsible for the nonsense remains a non trivial task. This thesis investigates the usage of automatically gathered linguistic evidence in order to detect and repair violations of common sense within such datasets. The main intuition consists in exploiting distributional similarity between named individuals of an input KB, in order to identify consequences which are unlikely to hold if the rest of the KB does. Then the repair phase consists in selecting axioms to be preferably discarded (or at least amended) in order to get rid of the nonsense. A second strategy is also presented, which consists in strengthening the input KB with a foundational ontology, in order to obtain an inconsistency, before performing a form of knowledge base debugging/revision which incorporates this linguistic input. This last step may also be applied directly to an inconsistent input KB. These propositions are evaluated with different sets of statements issued from the Linked Open Data cloud, as well as datasets of a higher quality, but which were automatically degraded for the evaluation. The results seem to indicate that distributional evidence may actually constitute a relevant common ground for deciding between conflicting axioms
    • …
    corecore