8 research outputs found

    Federated knowledge base debugging in DL-Lite A

    Full text link
    Due to the continuously growing amount of data the federation of different and distributed data sources gained increasing attention. In order to tackle the challenge of federating heterogeneous sources a variety of approaches has been proposed. Especially in the context of the Semantic Web the application of Description Logics is one of the preferred methods to model federated knowledge based on a well-defined syntax and semantics. However, the more data are available from heterogeneous sources, the higher the risk is of inconsistency – a serious obstacle for performing reasoning tasks and query answering over a federated knowledge base. Given a single knowledge base the process of knowledge base debugging comprising the identification and resolution of conflicting statements have been widely studied while the consideration of federated settings integrating a network of loosely coupled data sources (such as LOD sources) has mostly been neglected. In this thesis we tackle the challenging problem of debugging federated knowledge bases and focus on a lightweight Description Logic language, called DL-LiteA, that is aimed at applications requiring efficient and scalable reasoning. After introducing formal foundations such as Description Logics and Semantic Web technologies we clarify the motivating context of this work and discuss the general problem of information integration based on Description Logics. The main part of this thesis is subdivided into three subjects. First, we discuss the specific characteristics of federated knowledge bases and provide an appropriate approach for detecting and explaining contradictive statements in a federated DL-LiteA knowledge base. Second, we study the representation of the identified conflicts and their relationships as a conflict graph and propose an approach for repair generation based on majority voting and statistical evidences. Third, in order to provide an alternative way for handling inconsistency in federated DL-LiteA knowledge bases we propose an automated approach for assessing adequate trust values (i.e., probabilities) at different levels of granularity by leveraging probabilistic inference over a graphical model. In the last part of this thesis, we evaluate the previously developed algorithms against a set of large distributed LOD sources. In the course of discussing the experimental results, it turns out that the proposed approaches are sufficient, efficient and scalable with respect to real-world scenarios. Moreover, due to the exploitation of the federated structure in our algorithms it further becomes apparent that the number of identified wrong statements, the quality of the generated repair as well as the fineness of the assessed trust values profit from an increasing number of integrated sources

    Correcting Knowledge Base Assertions

    Get PDF
    The usefulness and usability of knowledge bases (KBs) is often limited by quality issues. One common issue is the presence of erroneous assertions, often caused by lexical or semantic confusion. We study the problem of correcting such assertions, and present a general correction framework which combines lexical matching, semantic embedding, soft constraint mining and semantic consistency checking. The framework is evaluated using DBpedia and an enterprise medical KB

    Pseudo-contractions as Gentle Repairs

    Get PDF
    Updating a knowledge base to remove an unwanted consequence is a challenging task. Some of the original sentences must be either deleted or weakened in such a way that the sentence to be removed is no longer entailed by the resulting set. On the other hand, it is desirable that the existing knowledge be preserved as much as possible, minimising the loss of information. Several approaches to this problem can be found in the literature. In particular, when the knowledge is represented by an ontology, two different families of frameworks have been developed in the literature in the past decades with numerous ideas in common but with little interaction between the communities: applications of AGM-like Belief Change and justification-based Ontology Repair. In this paper, we investigate the relationship between pseudo-contraction operations and gentle repairs. Both aim to avoid the complete deletion of sentences when replacing them with weaker versions is enough to prevent the entailment of the unwanted formula. We show the correspondence between concepts on both sides and investigate under which conditions they are equivalent. Furthermore, we propose a unified notation for the two approaches, which might contribute to the integration of the two areas

    Automated Reasoning

    Get PDF
    This volume, LNAI 13385, constitutes the refereed proceedings of the 11th International Joint Conference on Automated Reasoning, IJCAR 2022, held in Haifa, Israel, in August 2022. The 32 full research papers and 9 short papers presented together with two invited talks were carefully reviewed and selected from 85 submissions. The papers focus on the following topics: Satisfiability, SMT Solving,Arithmetic; Calculi and Orderings; Knowledge Representation and Jutsification; Choices, Invariance, Substitutions and Formalization; Modal Logics; Proofs System and Proofs Search; Evolution, Termination and Decision Prolems. This is an open access book

    Scalable Quality Assessment of Linked Data

    Get PDF
    In a world where the information economy is booming, poor data quality can lead to adverse consequences, including social and economical problems such as decrease in revenue. Furthermore, data-driven indus- tries are not just relying on their own (proprietary) data silos, but are also continuously aggregating data from different sources. This aggregation could then be re-distributed back to “data lakes”. However, this data (including Linked Data) is not necessarily checked for its quality prior to its use. Large volumes of data are being exchanged in a standard and interoperable format between organisations and published as Linked Data to facilitate their re-use. Some organisations, such as government institutions, take a step further and open their data. The Linked Open Data Cloud is a witness to this. However, similar to data in data lakes, it is challenging to determine the quality of this heterogeneous data, and subsequently to make this information explicit to data consumers. Despite the availability of a number of tools and frameworks to assess Linked Data quality, the current solutions do not aggregate a holistic approach that enables both the assessment of datasets and also provides consumers with quality results that can then be used to find, compare and rank datasets’ fitness for use. In this thesis we investigate methods to assess the quality of (possibly large) linked datasets with the intent that data consumers can then use the assessment results to find datasets that are fit for use, that is; finding the right dataset for the task at hand. Moreover, the benefits of quality assessment are two-fold: (1) data consumers do not need to blindly rely on subjective measures to choose a dataset, but base their choice on multiple factors such as the intrinsic structure of the dataset, therefore fostering trust and reputation between the publishers and consumers on more objective foundations; and (2) data publishers can be encouraged to improve their datasets so that they can be re-used more. Furthermore, our approach scales for large datasets. In this regard, we also look into improving the efficiency of quality metrics using various approximation techniques. However the trade-off is that consumers will not get the exact quality value, but a very close estimate which anyway provides the required guidance towards fitness for use. The central point of this thesis is not on data quality improvement, nonetheless, we still need to understand what data quality means to the consumers who are searching for potential datasets. This thesis looks into the challenges faced to detect quality problems in linked datasets presenting quality results in a standardised machine-readable and interoperable format for which agents can make sense out of to help human consumers identifying the fitness for use dataset. Our proposed approach is more consumer-centric where it looks into (1) making the assessment of quality as easy as possible, that is, allowing stakeholders, possibly non-experts, to identify and easily define quality metrics and to initiate the assessment; and (2) making results (quality metadata and quality reports) easy for stakeholders to understand, or at least interoperable with other systems to facilitate a possible data quality pipeline. Finally, our framework is used to assess the quality of a number of heterogeneous (large) linked datasets, where each assessment returns a quality metadata graph that can be consumed by agents as Linked Data. In turn, these agents can intelligently interpret a dataset’s quality with regard to multiple dimensions and observations, and thus provide further insight to consumers regarding its fitness for use

    Πλαίσιο εντοπισμού και επιδιόθρωσης λαθών σε γνωσιακές βάσεις DL-LiteA

    No full text
    Στη βιβλιογραφία έχουν προταθεί αρκετοί λογικοί φορμαλισμοί με σκοπό την έκφραση δομικών και σημασιολογικών περιορισμών ακεραιότητας, στο πλαίσιο των Δια-συνδεδεμένων Ανοιχτών Δεδομένων (Linked Open Data - LOD). Ωστόσο, η ανάγκη βελτίωσης της ποιότητας των δεδομένων που δημοσιεύονται στο σύννεφο Διασυνδεδεμένων Ανοιχτών Δεδομένων (LOD Cloud) παραμένει, καθώς τα δημοσιευμένα α¬νοιχτά δεδομένα συχνά παραβιάζουν τέτοιου είδους περιορισμούς ακεραιότητας. Αυτή η έλλειψη συνέπειας των δεδομένων με τους περιορισμούς, μπορεί να θέσει σε κίνδυνο την αξία εφαρμογών που καταναλώνουν ανοιχτά δεδομένα με αυτόματο τρόπο. Μία βασική πρόκληση για τη βελτίωση της ποιότητας των δεδομένων, είναι η παροχή στους διαχειριστές των γνωσιακών βάσεων, εργαλείων που θα τους βοηθούν στον εντοπισμό παραβιάσεων των περιορισμών ακεραιότητας, καθώς και στην επίλυση τέτοιων παραβιάσεων. Στην εργασία αυτή προτείνεται ένα νέο, πλήρως αυτόματοματοποιημένο πλαίσιο εντοπισμού παραβιάσεων περιορισμών ακεραιότητας σε γνωσιακές βάσεις, εκτελών¬τας τις απαραίτητες επερωτήσεις, καθώς και επιδιόρθωσης αυτών των παραβιάσεων, αφαιρώντας ασυνεπή δεδομένα από τη γνωσιακή βάση. Η μέθοδος που παρουσιάζε¬ται, λαμβάνει υπόψη την οντολογική γνώση που είτε προκύπτει από ρητές δηλώσεις, είτε προκύπτει σα συμπέρασμα από συνδυασμό ρητών δηλώσεων, χρησιμοποιώντας τη γλώσσα οντολογιών DL-LiteA για την έκφραση χρήσιμων λογικών περιορισμών, καθώς και για τον εντοπισμό δεδομένων που είναι ασυνεπή με αυτούς τους περιορισμούς, διατηρώντας, παράλληλα, καλές ιδιότητες υπολογιστικής πολυπλοκότητας. Το πλαίσιο που προτείνεται αποτελείται από συστατικά μέρη που μπορούν να υλοποιηθούν ανεξάρτητα το ένα με το άλλο, δίνοντας έτσι τη δυνατότητα για τη χρησιμοποίηση έτοιμων, σύγχρονων, βελτιστοποιημένων εργαλείων για διάφορες λειτουργίες, όπως είναι η εκτέλεση επερωτήσεων. Στα πλαίσια αυτής της εργασίας, παρουσιάζεται η υλοποίηση του πλαισίου, κα¬θώς και η αξιολόγηση της επίδοσής του, από την οποία εξάγεται το συμπέρασμα ότι μπορεί να χρησιμοποιηθεί για μεγάλα σύνολα δεδομένων και για μεγάλους αριθμούς παραβιάσεων, που παρατηρούνται στην πραγματικότητα σε γνωστές γνωσιακές βάσεις αναφοράς, όπως η DBpedia. Από την αξιολόγηση εξάγεται, επίσης, το συμπέρασμα πως το πλαίσιο που παρουσιάζεται σε αυτή την εργασία μπορεί να χρησιμοποιηθεί πάνω από γνωσιακές βάσεις που είναι ήδη σε λειτουργία, χωρίς καμία επιπλέον παραμετροποίηση.Several logical formalisms have been proposed in the literature for expressing structural and semantic integrity constraints of Linked Open Data (LOD). Still, the data quality of the datasets published in the LOD Cloud needs to be improved, as published linked data often violate such constraints. This lack of consistency may jeopardise the value of applications consuming linked data in an automatic way.A major challenge in this respect, is to provide to the curators of linked data knowledge bases (KBs), the tools that will help them in detecting the violations of integrity constraints and in resolving them, in order to render the knowledge base valid and improve its data quality. In this work, we propose a novel, fully automatic framework for detecting violations of integrity constraints (diagnosis) in KBs, by executing the appropriate queries over the data, as well as for resolving those violations (repair ), by removing invalid data from the KB. Our approach takes into consideration both explicit and inferred ontology knowledge, by relying on the ontology language DL-LiteA for the expression of several useful types of logical constraints and for the detection of data that are inconsistent with those constraints, while maintaining good computational properties. The framework that is proposed in this work is modular, allowing each component to be implemented in a manner independent to the other components. This way, we are able to implement our framework with using off-the-shelf, state-of-theart tools for several features, such as reasoning, query execution, etc. We have implemented and evaluated our framework, showing that it is scalable for large datasets and numbers of invalidities, which are exhibited in reality by reference linked datasets, such as DBpedia. The evaluation also shows that our framework can be used over already deployed knowledge bases, without any further reconfiguration
    corecore