263 research outputs found
Query Rewriting with Disjunctive Existential Rules and Mappings
We consider the issue of answering unions of conjunctive queries (UCQs) with
disjunctive existential rules and mappings. While this issue has already been
well studied from a chase perspective, query rewriting within UCQs has hardly
been addressed yet. We first propose a sound and complete query rewriting
operator, which has the advantage of establishing a tight relationship between
a chase step and a rewriting step. The associated breadth-first query rewriting
algorithm outputs a minimal UCQ-rewriting when one exists. Second, we show that
for any ``truly disjunctive'' nonrecursive rule, there exists a conjunctive
query that has no UCQ-rewriting. It follows that the notion of finite
unification sets (fus), which denotes sets of existential rules such that any
UCQ admits a UCQ-rewriting, seems to have little relevance in this setting.
Finally, turning our attention to mappings, we show that the problem of
determining whether a UCQ admits a UCQ-rewriting through a disjunctive mapping
is undecidable. We conclude with a number of open problems.Comment: This report contains the paper accepted at KR 2023 and an appendix
with full proofs. 24 page
Deciding FO-rewritability of Regular Languages and Ontology-Mediated Queries in Linear Temporal Logic
Our concern is the problem of determining the data complexity of answering an ontology-mediated query (OMQ) formulated in linear temporal logic LTL over (Z,<) and deciding whether it is rewritable to an FO(<)-query, possibly with some extra predicates. First, we observe that, in line with the circuit complexity and FO-definability of regular languages, OMQ answering in AC0, ACC0 and NC1 coincides with FO(<,≡)-rewritability using unary predicates x ≡ 0 (mod n), FO(<,MOD)-rewritability, and FO(RPR)-rewritability using relational primitive recursion, respectively. We prove that, similarly to known PSᴘᴀᴄᴇ-completeness of recognising FO(<)-definability of regular languages, deciding FO(<,≡)- and FO(<,MOD)-definability is also PSᴘᴀᴄᴇ-complete (unless ACC0 = NC1). We then use this result to show that deciding FO(<)-, FO(<,≡)- and FO(<,MOD)-rewritability of LTL OMQs is ExᴘSᴘᴀᴄᴇ-complete, and that these problems become PSᴘᴀᴄᴇ-complete for OMQs with a linear Horn ontology and an atomic query, and also a positive query in the cases of FO(<)- and FO(<,≡)-rewritability. Further, we consider FO(<)-rewritability of OMQs with a binary-clause ontology and identify OMQ classes, for which deciding it is PSᴘᴀᴄᴇ-, Π2p- and coNP-complete
Semiring Provenance for Lightweight Description Logics
We investigate semiring provenance--a successful framework originally defined
in the relational database setting--for description logics. In this context,
the ontology axioms are annotated with elements of a commutative semiring and
these annotations are propagated to the ontology consequences in a way that
reflects how they are derived. We define a provenance semantics for a language
that encompasses several lightweight description logics and show its
relationships with semantics that have been defined for ontologies annotated
with a specific kind of annotation (such as fuzzy degrees). We show that under
some restrictions on the semiring, the semantics satisfies desirable properties
(such as extending the semiring provenance defined for databases). We then
focus on the well-known why-provenance, which allows to compute the semiring
provenance for every additively and multiplicatively idempotent commutative
semiring, and for which we study the complexity of problems related to the
provenance of an axiom or a conjunctive query answer. Finally, we consider two
more restricted cases which correspond to the so-called positive Boolean
provenance and lineage in the database setting. For these cases, we exhibit
relationships with well-known notions related to explanations in description
logics and complete our complexity analysis. As a side contribution, we provide
conditions on an ELHI_bot ontology that guarantee tractable reasoning.Comment: Paper currently under review. 102 page
Knowledge extraction from unstructured data
Data availability is becoming more essential, considering the current growth of web-based data. The data available on the web are represented as unstructured, semi-structured, or structured data. In order to make the web-based data available for several Natural Language Processing or Data Mining tasks, the data needs to be presented as machine-readable data in a structured format. Thus, techniques for addressing the problem of capturing knowledge from unstructured data sources are needed. Knowledge extraction methods are used by the research communities to address this problem; methods that are able to capture knowledge in a natural language text and map the extracted knowledge to existing knowledge presented in knowledge graphs (KGs). These knowledge extraction methods include Named-entity recognition, Named-entity Disambiguation, Relation Recognition, and Relation Linking. This thesis addresses the problem of extracting knowledge over unstructured data and discovering patterns in the extracted knowledge. We devise a rule-based approach for entity and relation recognition and linking. The defined approach effectively maps entities and relations within a text to their resources in a target KG. Additionally, it overcomes the challenges of recognizing and linking entities and relations to a specific KG by employing devised catalogs of linguistic and domain-specific rules that state the criteria to recognize entities in a sentence of a particular language, and a deductive database that encodes knowledge in community-maintained KGs. Moreover, we define a Neuro-symbolic approach for the tasks of knowledge extraction in encyclopedic and domain-specific domains; it combines symbolic and sub-symbolic components to overcome the challenges of entity recognition and linking and the limitation of the availability of training data while maintaining the accuracy of recognizing and linking entities. Additionally, we present a context-aware framework for unveiling semantically related posts in a corpus; it is a knowledge-driven framework that retrieves associated posts effectively. We cast the problem of unveiling semantically related posts in a corpus into the Vertex Coloring Problem. We evaluate the performance of our techniques on several benchmarks related to various domains for knowledge extraction tasks. Furthermore, we apply these methods in real-world scenarios from national and international projects. The outcomes show that our techniques are able to effectively extract knowledge encoded in unstructured data and discover patterns over the extracted knowledge presented as machine-readable data. More importantly, the evaluation results provide evidence to the effectiveness of combining the reasoning capacity of the symbolic frameworks with the power of pattern recognition and classification of sub-symbolic models
OWL Reasoners still useable in 2023
In a systematic literature and software review over 100 OWL reasoners/systems
were analyzed to see if they would still be usable in 2023. This has never been
done in this capacity. OWL reasoners still play an important role in knowledge
organisation and management, but the last comprehensive surveys/studies are
more than 8 years old. The result of this work is a comprehensive list of 95
standalone OWL reasoners and systems using an OWL reasoner. For each item,
information on project pages, source code repositories and related
documentation was gathered. The raw research data is provided in a Github
repository for anyone to use
Meta-ontology fault detection
Ontology engineering is the field, within knowledge representation, concerned with using logic-based formalisms to represent knowledge, typically moderately sized knowledge bases called ontologies. How to best develop, use and maintain these ontologies has produced relatively large bodies of both formal, theoretical and methodological research.
One subfield of ontology engineering is ontology debugging, and is concerned with preventing, detecting and repairing errors (or more generally pitfalls, bad practices or faults) in ontologies. Due to the logical nature of ontologies and, in particular, entailment, these faults are often both hard to prevent and detect and have far reaching consequences. This makes ontology debugging one of the principal challenges to more widespread adoption of ontologies in applications.
Moreover, another important subfield in ontology engineering is that of ontology alignment: combining multiple ontologies to produce more powerful results than the simple sum of the parts. Ontology alignment further increases the issues, difficulties and challenges of ontology debugging by introducing, propagating and exacerbating faults in ontologies.
A relevant aspect of the field of ontology debugging is that, due to the challenges and difficulties, research within it is usually notably constrained in its scope, focusing on particular aspects of the problem or on the application to only certain subdomains or under specific methodologies. Similarly, the approaches are often ad hoc and only related to other approaches at a conceptual level. There are no well established and widely used formalisms, definitions or benchmarks that form a foundation of the field of ontology debugging.
In this thesis, I tackle the problem of ontology debugging from a more abstract than usual point of view, looking at existing literature in the field and attempting to extract common ideas and specially focussing on formulating them in a common language and under a common approach. Meta-ontology fault detection is a framework for detecting faults in ontologies that utilizes semantic fault patterns to express schematic entailments that typically indicate faults in a systematic way. The formalism that I developed to represent these patterns is called existential second-order query logic (abbreviated as ESQ logic). I further reformulated a large proportion of the ideas present in some of the existing research pieces into this framework and as patterns in ESQ logic, providing a pattern catalogue.
Most of the work during my PhD has been spent in designing and implementing
an algorithm to effectively automatically detect arbitrary ESQ patterns in arbitrary ontologies. The result is what we call minimal commitment resolution for ESQ logic, an extension of first-order resolution, drawing on important ideas from higher-order unification and implementing a novel approach to unification problems using dependency graphs. I have proven important theoretical properties about this algorithm such as its soundness, its termination (in a certain sense and under certain conditions) and its fairness or completeness in the enumeration of infinite spaces of solutions.
Moreover, I have produced an implementation of minimal commitment resolution for ESQ logic in Haskell that has passed all unit tests and produces non-trivial results on small examples. However, attempts to apply this algorithm to examples of a more realistic size have proven unsuccessful, with computation times that exceed our tolerance levels.
In this thesis, I have provided both details of the challenges faced in this regard,
as well as other successful forms of qualitative evaluation of the meta-ontology fault detection approach, and discussions about both what I believe are the main causes of the computational feasibility problems, ideas on how to overcome them, and also ideas on other directions of future work that could use the results in the thesis to contribute to the production of foundational formalisms, ideas and approaches to ontology debugging that can properly combine existing constrained research. It is unclear to me whether minimal commitment resolution for ESQ logic can, in its current shape, be implemented efficiently or not, but I believe that, at the very least, the theoretical and conceptual underpinnings that I have presented in this thesis will be useful to produce more
foundational results in the field
Scalable Query Answering Under Uncertainty to Neuroscientific Ontological Knowledge: The NeuroLang Approach
Researchers in neuroscience have a growing number of datasets available to study the brain, which is made possible by recent technological advances. Given the extent to which the brain has been studied, there is also available ontological knowledge encoding the current state of the art regarding its different areas, activation patterns, keywords associated with studies, etc. Furthermore, there is inherent uncertainty associated with brain scans arising from the mapping between voxels—3D pixels—and actual points in different individual brains. Unfortunately, there is currently no unifying framework for accessing such collections of rich heterogeneous data under uncertainty, making it necessary for researchers to rely on ad hoc tools. In particular, one major weakness of current tools that attempt to address this task is that only very limited propositional query languages have been developed. In this paper we present NeuroLang, a probabilistic language based on first-order logic with existential rules, probabilistic uncertainty, ontologies integration under the open world assumption, and built-in mechanisms to guarantee tractable query answering over very large datasets. NeuroLang’s primary objective is to provide a unified framework to seamlessly integrate heterogeneous data, such as ontologies, and map fine-grained cognitive domains to brain regions through a set of formal criteria, promoting shareable and highly reproducible research. After presenting the language and its general query answering architecture, we discuss real-world use cases showing how NeuroLang can be applied to practical scenarios.Fil: Zanitti, Gaston E.. No especifÃca;Fil: Soto, Yamil Osvaldo Omar. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Iovene, Valentin. No especifÃca;Fil: Martinez, Maria Vanina. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Rodriguez, Ricardo Oscar. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Simari, Gerardo. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Wassermann, Demian. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; Argentin
Better Together: Unifying Datalog and Equality Saturation
We present egglog, a fixpoint reasoning system that unifies Datalog and
equality saturation (EqSat). Like Datalog, it supports efficient incremental
execution, cooperating analyses, and lattice-based reasoning. Like EqSat, it
supports term rewriting, efficient congruence closure, and extraction of
optimized terms.
We identify two recent applications--a unification-based pointer analysis in
Datalog and an EqSat-based floating-point term rewriter--that have been
hampered by features missing from Datalog but found in EqSat or vice-versa. We
evaluate egglog by reimplementing those projects in egglog. The resulting
systems in egglog are faster, simpler, and fix bugs found in the original
systems.Comment: PLDI 202
Saturation-based Boolean conjunctive query answering and rewriting for the guarded quantification fragments
Query answering is an important problem in AI, database and knowledge
representation. In this paper, we develop saturation-based Boolean conjunctive
query answering and rewriting procedures for the guarded, the loosely guarded
and the clique-guarded fragments. Our query answering procedure improves
existing resolution-based decision procedures for the guarded and the loosely
guarded fragments and this procedure solves Boolean conjunctive query answering
problems for the guarded, the loosely guarded and the clique-guarded fragments.
Based on this query answering procedure, we also introduce a novel
saturation-based query rewriting procedure for these guarded fragments. Unlike
mainstream query answering and rewriting methods, our procedures derive a
compact and reusable saturation, namely a closure of formulas, to handle the
challenge of querying for distributed datasets. This paper lays the theoretical
foundations for the first automated deduction decision procedures for Boolean
conjunctive query answering and the first saturation-based Boolean conjunctive
query rewriting in the guarded, the loosely guarded and the clique-guarded
fragments
- …