268 research outputs found

    Scallop: A Language for Neurosymbolic Programming

    Full text link
    We present Scallop, a language which combines the benefits of deep learning and logical reasoning. Scallop enables users to write a wide range of neurosymbolic applications and train them in a data- and compute-efficient manner. It achieves these goals through three key features: 1) a flexible symbolic representation that is based on the relational data model; 2) a declarative logic programming language that is based on Datalog and supports recursion, aggregation, and negation; and 3) a framework for automatic and efficient differentiable reasoning that is based on the theory of provenance semirings. We evaluate Scallop on a suite of eight neurosymbolic applications from the literature. Our evaluation demonstrates that Scallop is capable of expressing algorithmic reasoning in diverse and challenging AI tasks, provides a succinct interface for machine learning programmers to integrate logical domain knowledge, and yields solutions that are comparable or superior to state-of-the-art models in terms of accuracy. Furthermore, Scallop's solutions outperform these models in aspects such as runtime and data efficiency, interpretability, and generalizability

    Scalable Query Answering Under Uncertainty to Neuroscientific Ontological Knowledge: The NeuroLang Approach

    Get PDF
    Researchers in neuroscience have a growing number of datasets available to study the brain, which is made possible by recent technological advances. Given the extent to which the brain has been studied, there is also available ontological knowledge encoding the current state of the art regarding its different areas, activation patterns, keywords associated with studies, etc. Furthermore, there is inherent uncertainty associated with brain scans arising from the mapping between voxels—3D pixels—and actual points in different individual brains. Unfortunately, there is currently no unifying framework for accessing such collections of rich heterogeneous data under uncertainty, making it necessary for researchers to rely on ad hoc tools. In particular, one major weakness of current tools that attempt to address this task is that only very limited propositional query languages have been developed. In this paper we present NeuroLang, a probabilistic language based on first-order logic with existential rules, probabilistic uncertainty, ontologies integration under the open world assumption, and built-in mechanisms to guarantee tractable query answering over very large datasets. NeuroLang’s primary objective is to provide a unified framework to seamlessly integrate heterogeneous data, such as ontologies, and map fine-grained cognitive domains to brain regions through a set of formal criteria, promoting shareable and highly reproducible research. After presenting the language and its general query answering architecture, we discuss real-world use cases showing how NeuroLang can be applied to practical scenarios.Fil: Zanitti, Gaston E.. No especifíca;Fil: Soto, Yamil Osvaldo Omar. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Iovene, Valentin. No especifíca;Fil: Martinez, Maria Vanina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Rodriguez, Ricardo Oscar. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Simari, Gerardo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Wassermann, Demian. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentin

    The Stable Model Semantics of Datalog with Metric Temporal Operators

    Full text link
    We introduce negation under the stable model semantics in DatalogMTL - a temporal extension of Datalog with metric temporal operators. As a result, we obtain a rule language which combines the power of answer set programming with the temporal dimension provided by metric operators. We show that, in this setting, reasoning becomes undecidable over the rational timeline, and decidable in EXPSPACE in data complexity over the integer timeline. We also show that, if we restrict our attention to forward-propagating programs, reasoning over the integer timeline becomes PSPACE-complete in data complexity, and hence, no harder than over positive programs; however, reasoning over the rational timeline in this fragment remains undecidable. Under consideration in Theory and Practice of Logic Programming (TPLP).Comment: Under consideration in Theory and Practice of Logic Programming (TPLP

    On the Correspondence Between Monotonic Max-Sum GNNs and Datalog

    Full text link
    Although there has been significant interest in applying machine learning techniques to structured data, the expressivity (i.e., a description of what can be learned) of such techniques is still poorly understood. In this paper, we study data transformations based on graph neural networks (GNNs). First, we note that the choice of how a dataset is encoded into a numeric form processable by a GNN can obscure the characterisation of a model's expressivity, and we argue that a canonical encoding provides an appropriate basis. Second, we study the expressivity of monotonic max-sum GNNs, which cover a subclass of GNNs with max and sum aggregation functions. We show that, for each such GNN, one can compute a Datalog program such that applying the GNN to any dataset produces the same facts as a single round of application of the program's rules to the dataset. Monotonic max-sum GNNs can sum an unbounded number of feature vectors which can result in arbitrarily large feature values, whereas rule application requires only a bounded number of constants. Hence, our result shows that the unbounded summation of monotonic max-sum GNNs does not increase their expressive power. Third, we sharpen our result to the subclass of monotonic max GNNs, which use only the max aggregation function, and identify a corresponding class of Datalog programs

    Semantically-guided goal-sensitive reasoning: decision procedures and the Koala prover

    Get PDF
    The main topic of this article are SGGS decision procedures for fragments of first-order logic without equality. SGGS (Semantically-Guided Goal-Sensitive reasoning) is an attractive basis for decision procedures, because it generalizes to first-order logic the Conflict-Driven Clause Learning (CDCL) procedure for propositional satisfiability. As SGGS is both refutationally complete and model-complete in the limit, SGGS decision procedures are model-constructing. We investigate the termination of SGGS with both positive and negative results: for example, SGGS decides Datalog and the stratified fragment (including Effectively PRopositional logic) that are relevant to many applications. Then we discover several new decidable fragments, by showing that SGGS decides them. These fragments have the small model property, as the cardinality of their SGGS-generated models can be upper bounded, and for most of them termination tools can be applied to test a set of clauses for membership. We also present the first implementation of SGGS - the Koala theorem prover - and we report on experiments with Koala

    Breaking the Negative Cycle: Exploring the Design Space of Stratification for First-Class Datalog Constraints

    Get PDF
    The ?_Dat calculus brings together the power of functional and declarative logic programming in one language. In ?_Dat, Datalog constraints are first-class values that can be constructed, passed around as arguments, returned, composed with other constraints, and solved. A significant part of the expressive power of Datalog comes from the use of negation. Stratified negation is a particularly simple and practical form of negation accessible to ordinary programmers. Stratification requires that Datalog programs must not use recursion through negation. For a Datalog program, this requirement is straightforward to check, but for a ?_Dat program, it is not so simple: A ?_Dat program constructs, composes, and solves Datalog programs at runtime. Hence stratification cannot readily be determined at compile-time. In this paper, we explore the design space of stratification for ?_Dat. We investigate strategies to ensure, at compile-time, that programs constructed at runtime are guaranteed to be stratified, and we argue that previous design choices in the Flix programming language have been suboptimal

    Restrictable Variants: A Simple and Practical Alternative to Extensible Variants

    Get PDF
    We propose restrictable variants as a simple and practical alternative to extensible variants. Restrictable variants combine nominal and structural typing: a restrictable variant is an algebraic data type indexed by a type-level set formula that captures its set of active labels. We introduce new pattern-matching constructs that allows programmers to write functions that only match on a subset of variants, i.e., pattern-matches may be non-exhaustive. We then present a type system for restrictable variants which ensures that such non-exhaustive matches cannot get stuck at runtime. An essential feature of restrictable variants is that the type system can capture structure-preserving transformations: specifically the introduction and elimination of variants. This property is important for writing reusable functions, yet many row-based extensible variant systems lack it. In this paper, we present a calculus with restrictable variants, two partial pattern-matching constructs, and a type system that ensures progress and preservation. The type system extends Hindley-Milner with restrictable variants and supports type inference with an extension of Algorithm W with Boolean unification. We implement restrictable variants as an extension of the Flix programming language and conduct a few case studies to illustrate their practical usefulness

    A tetrachotomy of ontology-mediated queries with a covering axiom

    Get PDF
    Our concern is the problem of efficiently determining the data complexity of answering queries mediated by descrip- tion logic ontologies and constructing their optimal rewritings to standard database queries. Originated in ontology- based data access and datalog optimisation, this problem is known to be computationally very complex in general, with no explicit syntactic characterisations available. In this article, aiming to understand the fundamental roots of this difficulty, we strip the problem to the bare bones and focus on Boolean conjunctive queries mediated by a simple cov- ering axiom stating that one class is covered by the union of two other classes. We show that, on the one hand, these rudimentary ontology-mediated queries, called disjunctive sirups (or d-sirups), capture many features and difficulties of the general case. For example, answering d-sirups is Π2p-complete for combined complexity and can be in AC0 or L-, NL-, P-, or coNP-complete for data complexity (with the problem of recognising FO-rewritability of d-sirups be- ing 2ExpTime-hard); some d-sirups only have exponential-size resolution proofs, some only double-exponential-size positive existential FO-rewritings and single-exponential-size nonrecursive datalog rewritings. On the other hand, we prove a few partial sufficient and necessary conditions of FO- and (symmetric/linear-) datalog rewritability of d- sirups. Our main technical result is a complete and transparent syntactic AC0 / NL / P / coNP tetrachotomy of d-sirups with disjoint covering classes and a path-shaped Boolean conjunctive query. To obtain this tetrachotomy, we develop new techniques for establishing P- and coNP-hardness of answering non-Horn ontology-mediated queries as well as showing that they can be answered in NL

    Efficient instance and hypothesis space revision in Meta-Interpretive Learning

    Get PDF
    Inductive Logic Programming (ILP) is a form of Machine Learning. The goal of ILP is to induce hypotheses, as logic programs, that generalise training examples. ILP is characterised by a high expressivity, generalisation ability and interpretability. Meta-Interpretive Learning (MIL) is a state-of-the-art sub-field of ILP. However, current MIL approaches have limited efficiency: the sample and learning complexity respectively are polynomial and exponential in the number of clauses. My thesis is that improvements over the sample and learning complexity can be achieved in MIL through instance and hypothesis space revision. Specifically, we investigate 1) methods that revise the instance space, 2) methods that revise the hypothesis space and 3) methods that revise both the instance and the hypothesis spaces for achieving more efficient MIL. First, we introduce a method for building training sets with active learning in Bayesian MIL. Instances are selected maximising the entropy. We demonstrate this method can reduce the sample complexity and supports efficient learning of agent strategies. Second, we introduce a new method for revising the MIL hypothesis space with predicate invention. Our method generates predicates bottom-up from the background knowledge related to the training examples. We demonstrate this method is complete and can reduce the learning and sample complexity. Finally, we introduce a new MIL system called MIGO for learning optimal two-player game strategies. MIGO learns from playing: its training sets are built from the sequence of actions it chooses. Moreover, MIGO revises its hypothesis space with Dependent Learning: it first solves simpler tasks and can reuse any learned solution for solving more complex tasks. We demonstrate MIGO significantly outperforms both classical and deep reinforcement learning. The methods presented in this thesis open exciting perspectives for efficiently learning theories with MIL in a wide range of applications including robotics, modelling of agent strategies and game playing.Open Acces
    • …
    corecore