268 research outputs found
Scallop: A Language for Neurosymbolic Programming
We present Scallop, a language which combines the benefits of deep learning
and logical reasoning. Scallop enables users to write a wide range of
neurosymbolic applications and train them in a data- and compute-efficient
manner. It achieves these goals through three key features: 1) a flexible
symbolic representation that is based on the relational data model; 2) a
declarative logic programming language that is based on Datalog and supports
recursion, aggregation, and negation; and 3) a framework for automatic and
efficient differentiable reasoning that is based on the theory of provenance
semirings. We evaluate Scallop on a suite of eight neurosymbolic applications
from the literature. Our evaluation demonstrates that Scallop is capable of
expressing algorithmic reasoning in diverse and challenging AI tasks, provides
a succinct interface for machine learning programmers to integrate logical
domain knowledge, and yields solutions that are comparable or superior to
state-of-the-art models in terms of accuracy. Furthermore, Scallop's solutions
outperform these models in aspects such as runtime and data efficiency,
interpretability, and generalizability
Scalable Query Answering Under Uncertainty to Neuroscientific Ontological Knowledge: The NeuroLang Approach
Researchers in neuroscience have a growing number of datasets available to study the brain, which is made possible by recent technological advances. Given the extent to which the brain has been studied, there is also available ontological knowledge encoding the current state of the art regarding its different areas, activation patterns, keywords associated with studies, etc. Furthermore, there is inherent uncertainty associated with brain scans arising from the mapping between voxels—3D pixels—and actual points in different individual brains. Unfortunately, there is currently no unifying framework for accessing such collections of rich heterogeneous data under uncertainty, making it necessary for researchers to rely on ad hoc tools. In particular, one major weakness of current tools that attempt to address this task is that only very limited propositional query languages have been developed. In this paper we present NeuroLang, a probabilistic language based on first-order logic with existential rules, probabilistic uncertainty, ontologies integration under the open world assumption, and built-in mechanisms to guarantee tractable query answering over very large datasets. NeuroLang’s primary objective is to provide a unified framework to seamlessly integrate heterogeneous data, such as ontologies, and map fine-grained cognitive domains to brain regions through a set of formal criteria, promoting shareable and highly reproducible research. After presenting the language and its general query answering architecture, we discuss real-world use cases showing how NeuroLang can be applied to practical scenarios.Fil: Zanitti, Gaston E.. No especifÃca;Fil: Soto, Yamil Osvaldo Omar. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Iovene, Valentin. No especifÃca;Fil: Martinez, Maria Vanina. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Rodriguez, Ricardo Oscar. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Simari, Gerardo. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; ArgentinaFil: Wassermann, Demian. Consejo Nacional de Investigaciones CientÃficas y Técnicas. Centro CientÃfico Tecnológico Conicet - BahÃa Blanca. Instituto de Ciencias e IngenierÃa de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e IngenierÃa de la Computación. Instituto de Ciencias e IngenierÃa de la Computación; Argentin
The Stable Model Semantics of Datalog with Metric Temporal Operators
We introduce negation under the stable model semantics in DatalogMTL - a
temporal extension of Datalog with metric temporal operators. As a result, we
obtain a rule language which combines the power of answer set programming with
the temporal dimension provided by metric operators. We show that, in this
setting, reasoning becomes undecidable over the rational timeline, and
decidable in EXPSPACE in data complexity over the integer timeline. We also
show that, if we restrict our attention to forward-propagating programs,
reasoning over the integer timeline becomes PSPACE-complete in data complexity,
and hence, no harder than over positive programs; however, reasoning over the
rational timeline in this fragment remains undecidable. Under consideration in
Theory and Practice of Logic Programming (TPLP).Comment: Under consideration in Theory and Practice of Logic Programming
(TPLP
On the Correspondence Between Monotonic Max-Sum GNNs and Datalog
Although there has been significant interest in applying machine learning
techniques to structured data, the expressivity (i.e., a description of what
can be learned) of such techniques is still poorly understood. In this paper,
we study data transformations based on graph neural networks (GNNs). First, we
note that the choice of how a dataset is encoded into a numeric form
processable by a GNN can obscure the characterisation of a model's
expressivity, and we argue that a canonical encoding provides an appropriate
basis. Second, we study the expressivity of monotonic max-sum GNNs, which cover
a subclass of GNNs with max and sum aggregation functions. We show that, for
each such GNN, one can compute a Datalog program such that applying the GNN to
any dataset produces the same facts as a single round of application of the
program's rules to the dataset. Monotonic max-sum GNNs can sum an unbounded
number of feature vectors which can result in arbitrarily large feature values,
whereas rule application requires only a bounded number of constants. Hence,
our result shows that the unbounded summation of monotonic max-sum GNNs does
not increase their expressive power. Third, we sharpen our result to the
subclass of monotonic max GNNs, which use only the max aggregation function,
and identify a corresponding class of Datalog programs
Semantically-guided goal-sensitive reasoning: decision procedures and the Koala prover
The main topic of this article are SGGS decision procedures for fragments of first-order logic without equality. SGGS (Semantically-Guided Goal-Sensitive reasoning) is an attractive basis for decision procedures, because it generalizes to first-order logic the Conflict-Driven Clause Learning (CDCL) procedure for propositional satisfiability. As SGGS is both refutationally complete and model-complete in the limit, SGGS decision procedures are model-constructing. We investigate the termination of SGGS with both positive and negative results: for example, SGGS decides Datalog and the stratified fragment (including Effectively PRopositional logic) that are relevant to many applications. Then we discover several new decidable fragments, by showing that SGGS decides them. These fragments have the small model property, as the cardinality of their SGGS-generated models can be upper bounded, and for most of them termination tools can be applied to test a set of clauses for membership. We also present the first implementation of SGGS - the Koala theorem prover - and we report on experiments with Koala
Breaking the Negative Cycle: Exploring the Design Space of Stratification for First-Class Datalog Constraints
The ?_Dat calculus brings together the power of functional and declarative logic programming in one language. In ?_Dat, Datalog constraints are first-class values that can be constructed, passed around as arguments, returned, composed with other constraints, and solved.
A significant part of the expressive power of Datalog comes from the use of negation. Stratified negation is a particularly simple and practical form of negation accessible to ordinary programmers. Stratification requires that Datalog programs must not use recursion through negation.
For a Datalog program, this requirement is straightforward to check, but for a ?_Dat program, it is not so simple: A ?_Dat program constructs, composes, and solves Datalog programs at runtime. Hence stratification cannot readily be determined at compile-time.
In this paper, we explore the design space of stratification for ?_Dat. We investigate strategies to ensure, at compile-time, that programs constructed at runtime are guaranteed to be stratified, and we argue that previous design choices in the Flix programming language have been suboptimal
Restrictable Variants: A Simple and Practical Alternative to Extensible Variants
We propose restrictable variants as a simple and practical alternative to extensible variants. Restrictable variants combine nominal and structural typing: a restrictable variant is an algebraic data type indexed by a type-level set formula that captures its set of active labels. We introduce new pattern-matching constructs that allows programmers to write functions that only match on a subset of variants, i.e., pattern-matches may be non-exhaustive. We then present a type system for restrictable variants which ensures that such non-exhaustive matches cannot get stuck at runtime.
An essential feature of restrictable variants is that the type system can capture structure-preserving transformations: specifically the introduction and elimination of variants. This property is important for writing reusable functions, yet many row-based extensible variant systems lack it.
In this paper, we present a calculus with restrictable variants, two partial pattern-matching constructs, and a type system that ensures progress and preservation. The type system extends Hindley-Milner with restrictable variants and supports type inference with an extension of Algorithm W with Boolean unification. We implement restrictable variants as an extension of the Flix programming language and conduct a few case studies to illustrate their practical usefulness
A tetrachotomy of ontology-mediated queries with a covering axiom
Our concern is the problem of efficiently determining the data complexity of answering queries mediated by descrip- tion logic ontologies and constructing their optimal rewritings to standard database queries. Originated in ontology- based data access and datalog optimisation, this problem is known to be computationally very complex in general, with no explicit syntactic characterisations available. In this article, aiming to understand the fundamental roots of this difficulty, we strip the problem to the bare bones and focus on Boolean conjunctive queries mediated by a simple cov- ering axiom stating that one class is covered by the union of two other classes. We show that, on the one hand, these rudimentary ontology-mediated queries, called disjunctive sirups (or d-sirups), capture many features and difficulties of the general case. For example, answering d-sirups is Î 2p-complete for combined complexity and can be in AC0 or L-, NL-, P-, or coNP-complete for data complexity (with the problem of recognising FO-rewritability of d-sirups be- ing 2ExpTime-hard); some d-sirups only have exponential-size resolution proofs, some only double-exponential-size positive existential FO-rewritings and single-exponential-size nonrecursive datalog rewritings. On the other hand, we prove a few partial sufficient and necessary conditions of FO- and (symmetric/linear-) datalog rewritability of d- sirups. Our main technical result is a complete and transparent syntactic AC0 / NL / P / coNP tetrachotomy of d-sirups with disjoint covering classes and a path-shaped Boolean conjunctive query. To obtain this tetrachotomy, we develop new techniques for establishing P- and coNP-hardness of answering non-Horn ontology-mediated queries as well as showing that they can be answered in NL
Efficient instance and hypothesis space revision in Meta-Interpretive Learning
Inductive Logic Programming (ILP) is a form of Machine Learning. The goal of ILP is to induce hypotheses, as logic programs, that generalise training examples. ILP is characterised by a high expressivity, generalisation ability and interpretability. Meta-Interpretive Learning (MIL) is a state-of-the-art sub-field of ILP. However, current MIL approaches have limited efficiency: the sample and learning complexity respectively are polynomial and exponential in the number of clauses. My thesis is that improvements over the sample and learning complexity can be achieved in MIL through instance and hypothesis space revision. Specifically, we investigate 1) methods that revise the instance space, 2) methods that revise the hypothesis space and 3) methods that revise both the instance and the hypothesis spaces for achieving more efficient MIL.
First, we introduce a method for building training sets with active learning in Bayesian MIL. Instances are selected maximising the entropy. We demonstrate this method can reduce the sample complexity and supports efficient learning of agent strategies. Second, we introduce a new method for revising the MIL hypothesis space with predicate invention. Our method generates predicates bottom-up from the background knowledge related to the training examples. We demonstrate this method is complete and can reduce the learning and sample complexity. Finally, we introduce a new MIL system called MIGO for learning optimal two-player game strategies. MIGO learns from playing: its training sets are built from the sequence of actions it chooses. Moreover, MIGO revises its hypothesis space with Dependent Learning: it first solves simpler tasks and can reuse any learned solution for solving more complex tasks. We demonstrate MIGO significantly outperforms both classical and deep reinforcement learning. The methods presented in this thesis open exciting perspectives for efficiently learning theories with MIL in a wide range of applications including robotics, modelling of agent strategies and game playing.Open Acces
- …