53 research outputs found

    Combining Query Rewriting and Knowledge Graph Embeddings for Complex Query Answering

    Get PDF
    The field of complex query answering using Knowledge Graphs (KGs) has seen substantial advancements in recent years, primarily through the utilization of Knowledge Graph Embeddings (KGEs). However, these methodologies often stumble when faced with intricate query structures that involve multiple entities and relationships. This thesis primarily investigates the potential of integrating query rewriting techniques into the KGE query answering process to improve performance in such situations. Guided by a TBox, a schema that describes the concepts and relationships in the data from Description Logics, query rewriting translates a query into a union of rewritten queries that can potentially widen the prediction scope for KGEs. The thesis uses the PerfectRef algorithm for facilitating query rewriting, aiming to maximize the scope of query response and enhance prediction capabilities. Two distinct datasets were employed in the study: The Family Dataset, a subset of Wikidata, and DBPedia15k, a subset of DBPedia. The effectiveness of the proposed methodology was evaluated against these datasets using different KGE models, in our case TransE, DistMult, BoxE, RotatE, and CompGCN. The results demonstrate a notable improvement in complex query answering when query rewriting is used for both The Family dataset and DBPedia15k. Furthermore, the amalgamation of query rewriting and KGE predictions yielded a performance boost for The Family dataset. However, the same was not observed for DBPedia15k, likely due to discrepancies and errors present within DBPedia15k compared to the Full DBPedia KG used for validation in our framework. This research suggests that query rewriting, as a pre-processing step for KGE prediction, can enhance the performance of complex query answering, mainly when the dataset is not fully entailed. This study provides important insights into the potential and limitations of integrating query rewriting with KGEs. It may serve as a guidepost for future research to improve the complex query answering when a TBox is available.Masteroppgave i informatikkINF399MAMN-PROGMAMN-IN

    A tetrachotomy of ontology-mediated queries with a covering axiom

    Get PDF
    Our concern is the problem of efficiently determining the data complexity of answering queries mediated by descrip- tion logic ontologies and constructing their optimal rewritings to standard database queries. Originated in ontology- based data access and datalog optimisation, this problem is known to be computationally very complex in general, with no explicit syntactic characterisations available. In this article, aiming to understand the fundamental roots of this difficulty, we strip the problem to the bare bones and focus on Boolean conjunctive queries mediated by a simple cov- ering axiom stating that one class is covered by the union of two other classes. We show that, on the one hand, these rudimentary ontology-mediated queries, called disjunctive sirups (or d-sirups), capture many features and difficulties of the general case. For example, answering d-sirups is Π2p-complete for combined complexity and can be in AC0 or L-, NL-, P-, or coNP-complete for data complexity (with the problem of recognising FO-rewritability of d-sirups be- ing 2ExpTime-hard); some d-sirups only have exponential-size resolution proofs, some only double-exponential-size positive existential FO-rewritings and single-exponential-size nonrecursive datalog rewritings. On the other hand, we prove a few partial sufficient and necessary conditions of FO- and (symmetric/linear-) datalog rewritability of d- sirups. Our main technical result is a complete and transparent syntactic AC0 / NL / P / coNP tetrachotomy of d-sirups with disjoint covering classes and a path-shaped Boolean conjunctive query. To obtain this tetrachotomy, we develop new techniques for establishing P- and coNP-hardness of answering non-Horn ontology-mediated queries as well as showing that they can be answered in NL

    A tetrachotomy of ontology-mediated queries with a covering axiom

    Get PDF
    Our concern is the problem of efficiently determining the data complexity of answering queries mediated by descrip- tion logic ontologies and constructing their optimal rewritings to standard database queries. Originated in ontology- based data access and datalog optimisation, this problem is known to be computationally very complex in general, with no explicit syntactic characterisations available. In this article, aiming to understand the fundamental roots of this difficulty, we strip the problem to the bare bones and focus on Boolean conjunctive queries mediated by a simple cov- ering axiom stating that one class is covered by the union of two other classes. We show that, on the one hand, these rudimentary ontology-mediated queries, called disjunctive sirups (or d-sirups), capture many features and difficulties of the general case. For example, answering d-sirups is Π2p-complete for combined complexity and can be in AC0 or L-, NL-, P-, or coNP-complete for data complexity (with the problem of recognising FO-rewritability of d-sirups be- ing 2ExpTime-hard); some d-sirups only have exponential-size resolution proofs, some only double-exponential-size positive existential FO-rewritings and single-exponential-size nonrecursive datalog rewritings. On the other hand, we prove a few partial sufficient and necessary conditions of FO- and (symmetric/linear-) datalog rewritability of d- sirups. Our main technical result is a complete and transparent syntactic AC0 / NL / P / coNP tetrachotomy of d-sirups with disjoint covering classes and a path-shaped Boolean conjunctive query. To obtain this tetrachotomy, we develop new techniques for establishing P- and coNP-hardness of answering non-Horn ontology-mediated queries as well as showing that they can be answered in NL

    Using Ontologies to Query Probabilistic Numerical Data: Extended Version

    Get PDF
    We consider ontology-based query answering in a setting where some of the data are numerical and of a probabilistic nature, such as data obtained from uncertain sensor readings. The uncertainty for such numerical values can be more precisely represented by continuous probability distributions than by discrete probabilities for numerical facts concerning exact values. For this reason, we extend existing approaches using discrete probability distributions over facts by continuous probability distributions over numerical values. We determine the exact (data and combined) complexity of query answering in extensions of the well-known description logics EL and ALC with numerical comparison operators in this probabilistic setting.This is an extended version of the article in: Proceedings of the 11th International Symposium on Frontiers of Combining Systems. This version has been revised based on the comments of the reviewers

    Automated Deduction – CADE 28

    Get PDF
    This open access book constitutes the proceeding of the 28th International Conference on Automated Deduction, CADE 28, held virtually in July 2021. The 29 full papers and 7 system descriptions presented together with 2 invited papers were carefully reviewed and selected from 76 submissions. CADE is the major forum for the presentation of research in all aspects of automated deduction, including foundations, applications, implementations, and practical experience. The papers are organized in the following topics: Logical foundations; theory and principles; implementation and application; ATP and AI; and system descriptions

    Closed-World Semantics for Query Answering in Temporal Description Logics

    Get PDF
    Ontology-mediated query answering is a popular paradigm for enriching answers to user queries with background knowledge. For querying the absence of information, however, there exist only few ontology-based approaches. Moreover, these proposals conflate the closed-domain and closed-world assumption, and therefore are not suited to deal with the anonymous objects that are common in ontological reasoning. Many real-world applications, like processing electronic health records (EHRs), also contain a temporal dimension, and require efficient reasoning algorithms. Moreover, since medical data is not recorded on a regular basis, reasoners must deal with sparse data with potentially large temporal gaps. Our contribution consists of three main parts: Firstly, we introduce a new closed-world semantics for answering conjunctive queries with negation over ontologies formulated in the description logic ELH⊥, which is based on the minimal universal model. We propose a rewriting strategy for dealing with negated query atoms, which shows that query answering is possible in polynomial time in data complexity. Secondly, we introduce a new temporal variant of ELH⊥ that features a convexity operator. We extend this minimal-world semantics for answering metric temporal conjunctive queries with negation over the logic and obtain similar rewritability and complexity results. Thirdly, apart from the theoretical results, we evaluate minimal-world semantics in practice by selecting patients, based their EHRs, that match given criteria

    Semantically defined Analytics for Industrial Equipment Diagnostics

    Get PDF
    In this age of digitalization, industries everywhere accumulate massive amount of data such that it has become the lifeblood of the global economy. This data may come from various heterogeneous systems, equipment, components, sensors, systems and applications in many varieties (diversity of sources), velocities (high rate of changes) and volumes (sheer data size). Despite significant advances in the ability to collect, store, manage and filter data, the real value lies in the analytics. Raw data is meaningless, unless it is properly processed to actionable (business) insights. Those that know how to harness data effectively, have a decisive competitive advantage, through raising performance by making faster and smart decisions, improving short and long-term strategic planning, offering more user-centric products and services and fostering innovation. Two distinct paradigms in practice can be discerned within the field of analytics: semantic-driven (deductive) and data-driven (inductive). The first emphasizes logic as a way of representing the domain knowledge encoded in rules or ontologies and are often carefully curated and maintained. However, these models are often highly complex, and require intensive knowledge processing capabilities. Data-driven analytics employ machine learning (ML) to directly learn a model from the data with minimal human intervention. However, these models are tuned to trained data and context, making it difficult to adapt. Industries today that want to create value from data must master these paradigms in combination. However, there is great need in data analytics to seamlessly combine semantic-driven and data-driven processing techniques in an efficient and scalable architecture that allows extracting actionable insights from an extreme variety of data. In this thesis, we address these needs by providing: • A unified representation of domain-specific and analytical semantics, in form of ontology models called TechOnto Ontology Stack. It is highly expressive, platform-independent formalism to capture conceptual semantics of industrial systems such as technical system hierarchies, component partonomies etc and its analytical functional semantics. • A new ontology language Semantically defined Analytical Language (SAL) on top of the ontology model that extends existing DatalogMTL (a Horn fragment of Metric Temporal Logic) with analytical functions as first class citizens. • A method to generate semantic workflows using our SAL language. It helps in authoring, reusing and maintaining complex analytical tasks and workflows in an abstract fashion. • A multi-layer architecture that fuses knowledge- and data-driven analytics into a federated and distributed solution. To our knowledge, the work in this thesis is one of the first works to introduce and investigate the use of the semantically defined analytics in an ontology-based data access setting for industrial analytical applications. The reason behind focusing our work and evaluation on industrial data is due to (i) the adoption of semantic technology by the industries in general, and (ii) the common need in literature and in practice to allow domain expertise to drive the data analytics on semantically interoperable sources, while still harnessing the power of analytics to enable real-time data insights. Given the evaluation results of three use-case studies, our approach surpass state-of-the-art approaches for most application scenarios.Im Zeitalter der Digitalisierung sammeln die Industrien überall massive Daten-mengen, die zum Lebenselixier der Weltwirtschaft geworden sind. Diese Daten können aus verschiedenen heterogenen Systemen, Geräten, Komponenten, Sensoren, Systemen und Anwendungen in vielen Varianten (Vielfalt der Quellen), Geschwindigkeiten (hohe Änderungsrate) und Volumina (reine Datengröße) stammen. Trotz erheblicher Fortschritte in der Fähigkeit, Daten zu sammeln, zu speichern, zu verwalten und zu filtern, liegt der eigentliche Wert in der Analytik. Rohdaten sind bedeutungslos, es sei denn, sie werden ordnungsgemäß zu verwertbaren (Geschäfts-)Erkenntnissen verarbeitet. Wer weiß, wie man Daten effektiv nutzt, hat einen entscheidenden Wettbewerbsvorteil, indem er die Leistung steigert, indem er schnellere und intelligentere Entscheidungen trifft, die kurz- und langfristige strategische Planung verbessert, mehr benutzerorientierte Produkte und Dienstleistungen anbietet und Innovationen fördert. In der Praxis lassen sich im Bereich der Analytik zwei unterschiedliche Paradigmen unterscheiden: semantisch (deduktiv) und Daten getrieben (induktiv). Die erste betont die Logik als eine Möglichkeit, das in Regeln oder Ontologien kodierte Domänen-wissen darzustellen, und wird oft sorgfältig kuratiert und gepflegt. Diese Modelle sind jedoch oft sehr komplex und erfordern eine intensive Wissensverarbeitung. Datengesteuerte Analysen verwenden maschinelles Lernen (ML), um mit minimalem menschlichen Eingriff direkt ein Modell aus den Daten zu lernen. Diese Modelle sind jedoch auf trainierte Daten und Kontext abgestimmt, was die Anpassung erschwert. Branchen, die heute Wert aus Daten schaffen wollen, müssen diese Paradigmen in Kombination meistern. Es besteht jedoch ein großer Bedarf in der Daten-analytik, semantisch und datengesteuerte Verarbeitungstechniken nahtlos in einer effizienten und skalierbaren Architektur zu kombinieren, die es ermöglicht, aus einer extremen Datenvielfalt verwertbare Erkenntnisse zu gewinnen. In dieser Arbeit, die wir auf diese Bedürfnisse durch die Bereitstellung: • Eine einheitliche Darstellung der Domänen-spezifischen und analytischen Semantik in Form von Ontologie Modellen, genannt TechOnto Ontology Stack. Es ist ein hoch-expressiver, plattformunabhängiger Formalismus, die konzeptionelle Semantik industrieller Systeme wie technischer Systemhierarchien, Komponenten-partonomien usw. und deren analytische funktionale Semantik zu erfassen. • Eine neue Ontologie-Sprache Semantically defined Analytical Language (SAL) auf Basis des Ontologie-Modells das bestehende DatalogMTL (ein Horn fragment der metrischen temporären Logik) um analytische Funktionen als erstklassige Bürger erweitert. • Eine Methode zur Erzeugung semantischer workflows mit unserer SAL-Sprache. Es hilft bei der Erstellung, Wiederverwendung und Wartung komplexer analytischer Aufgaben und workflows auf abstrakte Weise. • Eine mehrschichtige Architektur, die Wissens- und datengesteuerte Analysen zu einer föderierten und verteilten Lösung verschmilzt. Nach unserem Wissen, die Arbeit in dieser Arbeit ist eines der ersten Werke zur Einführung und Untersuchung der Verwendung der semantisch definierten Analytik in einer Ontologie-basierten Datenzugriff Einstellung für industrielle analytische Anwendungen. Der Grund für die Fokussierung unserer Arbeit und Evaluierung auf industrielle Daten ist auf (i) die Übernahme semantischer Technologien durch die Industrie im Allgemeinen und (ii) den gemeinsamen Bedarf in der Literatur und in der Praxis zurückzuführen, der es der Fachkompetenz ermöglicht, die Datenanalyse auf semantisch inter-operablen Quellen voranzutreiben, und nutzen gleichzeitig die Leistungsfähigkeit der Analytik, um Echtzeit-Daten-einblicke zu ermöglichen. Aufgrund der Evaluierungsergebnisse von drei Anwendungsfällen Übertritt unser Ansatz für die meisten Anwendungsszenarien Modernste Ansätze

    Abstraction in ontology-based data management

    Get PDF
    In many aspects of our society there is growing awareness and consent on the need for data-driven approaches that are resilient, transparent, and fully accountable. But in order to fulfil the promises and benefits of a data-driven society, it is necessary that the data services exposed by the organisations' information systems are well-documented, and their semantics is clearly specified. Effectively documenting data services is indeed a crucial issue for organisations, not only for governing their own data, but also for interoperation purposes. In this thesis, we propose a new approach to automatically associate formal semantic descriptions to data services, thus bringing them into compliance with the FAIR guiding principles, i.e., make data services automatically Findable, Accessible, Interoperable, and Reusable (FAIR). We base our proposal on the Ontology-based Data Management (OBDM) paradigm, where a domain ontology is used to provide a semantic layer mapped to the data sources of an organisation, thus abstracting from the technical details of the data layer implementation. The basic idea is to characterise or explain the semantics of a given data service expressed as query over the source schema in terms of a query over the ontology. Thus, the query over the ontology represents an abstraction of the given data service in terms of the domain ontology through the mapping, and, together with the elements in the vocabulary of the ontology, such abstraction forms a basis for annotating the given data service with suitable metadata expressing its semantics. We illustrate a formal framework for the task of automatically produce a semantic characterisation of a given data service expressed as a query over the source schema. The framework is based on three semantically well-founded notions, namely perfect, sound, and complete source-to-ontology rewriting, and on two associated basic computational problems, namely verification and computation. The former verifies whether a given query over the ontology is a perfect (respectively, sound, complete) source-to-ontology rewriting of a given data service expressed as a query over the source schema, whereas the latter computes one such rewriting, provided it exists. We provide an in-depth complexity analysis of these two computational problems in a very general scenario which uses languages amongst the most popular considered in the literature of managing data through an ontology. Furthermore, since we study also cases where the target query language for expressing source-to-ontology rewritings allows inequality atoms, we also investigate the problem of answering queries with inequalities over lightweight ontologies, a problem that has been rarely addressed. In another direction, we study and advocate the use of a non-monotonic target query language for expressing source-to-ontology rewritings. Last but not least, we outline a detailed related work, which illustrates how the results achieved in this thesis notably contributes to new results in the Semantic Web context, in the relational database theory, and in view-based query processing
    • …
    corecore