407 research outputs found

    Fast and correct variational inference for probabilistic programming: Differentiability, reparameterisation and smoothing

    Get PDF
    Probabilistic programming is an innovative programming paradigm for posing and automatically solving Bayesian inference problems. In this thesis, we study the foundations of fast yet correct inference for probabilistic programming. Many of the most successful inference techniques (e.g. Hamiltonian Monte Carlo or Stochastic Variational Inference) harness gradients of the so-called density function, which therefore needs to be differentiable at least almost everywhere. We resolve a question posed by Hongseok Yang by demonstrating the following: densities of almost surely terminating programs are differentiable almost everywhere. Having established this property necessary for the correctness of gradient-based inference algorithms, we investigate variational inference, which frames posterior inference as an optimisation problem, in more detail. The dominant approach for stochastic optimisation in practice is stochastic gradient descent. In particular, a variant using the so-called reparameterisation gradient estimator exhibits low variance, resulting in fast convergence in a traditional statistics setting. Unfortunately, although having measure 0, discontinuities can compromise the correctness of this approach. Therefore, we propose a smoothed interpretation parameterised by an accuracy coefficient and present type systems establishing technical pre-conditions. Thus, we can prove stochastic gradient descent with the reparameterisation gradient estimator to be correct when applied to the smoothed problem. Besides, via a uniform convergence result, we can solve the original problem up to any error tolerance by choosing an accuracy coefficient suitably. Furthermore, rather than fixing an accuracy coefficient in advance (limiting the quality of the final solution), we propose a novel variant of stochastic gradient descent, Diagonalisation Stochastic Gradient Descent, which progressively enhances the accuracy of the smoothed approximation during optimisation, and we prove convergence to stationary points of the unsmoothed (original) objective. An empirical evaluation reveals benefits of our approaches over the state of the art: our approaches are simple, fast and attain orders of magnitude reduction in work- normalised variance. Besides, Diagonalisation Stochastic Gradient Descent is more stable than standard stochastic gradient descent for a fixed-accuracy smoothing. Finally, we show unbiasedness of the reparameterisation gradient estimator for continuous but non-differentiable models, and we propose a method based on higher-order logic to establish continuity in the presence of conditionals. We provide a sound and complete reduction for verifying continuity of models to a satisfiability problem, and we propose novel efficient randomised decision procedures

    Learning with Logical Constraints but without Shortcut Satisfaction

    Full text link
    Recent studies in neuro-symbolic learning have explored the integration of logical knowledge into deep learning via encoding logical constraints as an additional loss function. However, existing approaches tend to vacuously satisfy logical constraints through shortcuts, failing to fully exploit the knowledge. In this paper, we present a new framework for learning with logical constraints. Specifically, we address the shortcut satisfaction issue by introducing dual variables for logical connectives, encoding how the constraint is satisfied. We further propose a variational framework where the encoded logical constraint is expressed as a distributional loss that is compatible with the model's original training loss. The theoretical analysis shows that the proposed approach bears salient properties, and the experimental evaluations demonstrate its superior performance in both model generalizability and constraint satisfaction.Comment: Published as a conference paper at ICLR 2023, and code is available at https://github.com/SoftWiser-group/NeSy-without-Shortcut

    Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective

    Full text link
    Neural-symbolic computing has now become the subject of interest of both academic and industry research laboratories. Graph Neural Networks (GNN) have been widely used in relational and symbolic domains, with widespread application of GNNs in combinatorial optimization, constraint satisfaction, relational reasoning and other scientific domains. The need for improved explainability, interpretability and trust of AI systems in general demands principled methodologies, as suggested by neural-symbolic computing. In this paper, we review the state-of-the-art on the use of GNNs as a model of neural-symbolic computing. This includes the application of GNNs in several domains as well as its relationship to current developments in neural-symbolic computing.Comment: Updated version, draft of accepted IJCAI2020 Survey Pape

    A Formal Methods Approach to Pattern Synthesis in Reaction Diffusion Systems

    Full text link
    We propose a technique to detect and generate patterns in a network of locally interacting dynamical systems. Central to our approach is a novel spatial superposition logic, whose semantics is defined over the quad-tree of a partitioned image. We show that formulas in this logic can be efficiently learned from positive and negative examples of several types of patterns. We also demonstrate that pattern detection, which is implemented as a model checking algorithm, performs very well for test data sets different from the learning sets. We define a quantitative semantics for the logic and integrate the model checking algorithm with particle swarm optimization in a computational framework for synthesis of parameters leading to desired patterns in reaction-diffusion systems

    Scallop: A Language for Neurosymbolic Programming

    Full text link
    We present Scallop, a language which combines the benefits of deep learning and logical reasoning. Scallop enables users to write a wide range of neurosymbolic applications and train them in a data- and compute-efficient manner. It achieves these goals through three key features: 1) a flexible symbolic representation that is based on the relational data model; 2) a declarative logic programming language that is based on Datalog and supports recursion, aggregation, and negation; and 3) a framework for automatic and efficient differentiable reasoning that is based on the theory of provenance semirings. We evaluate Scallop on a suite of eight neurosymbolic applications from the literature. Our evaluation demonstrates that Scallop is capable of expressing algorithmic reasoning in diverse and challenging AI tasks, provides a succinct interface for machine learning programmers to integrate logical domain knowledge, and yields solutions that are comparable or superior to state-of-the-art models in terms of accuracy. Furthermore, Scallop's solutions outperform these models in aspects such as runtime and data efficiency, interpretability, and generalizability

    On Differentiable Interpreters

    Get PDF
    Neural networks have transformed the fields of Machine Learning and Artificial Intelligence with the ability to model complex features and behaviours from raw data. They quickly became instrumental models, achieving numerous state-of-the-art performances across many tasks and domains. Yet the successes of these models often rely on large amounts of data. When data is scarce, resourceful ways of using background knowledge often help. However, though different types of background knowledge can be used to bias the model, it is not clear how one can use algorithmic knowledge to that extent. In this thesis, we present differentiable interpreters as an effective framework for utilising algorithmic background knowledge as architectural inductive biases of neural networks. By continuously approximating discrete elements of traditional program interpreters, we create differentiable interpreters that, due to the continuous nature of their execution, are amenable to optimisation with gradient descent methods. This enables us to write code mixed with parametric functions, where the code strongly biases the behaviour of the model while enabling the training of parameters and/or input representations from data. We investigate two such differentiable interpreters and their use cases in this thesis. First, we present a detailed construction of ∂4, a differentiable interpreter for the programming language FORTH. We demonstrate the ability of ∂4 to strongly bias neural models with incomplete programs of variable complexity while learning missing pieces of the program with parametrised neural networks. Such models can learn to solve tasks and strongly generalise to out-of-distribution data from small datasets. Second, we present greedy Neural Theorem Provers (gNTPs), a significant improvement of a differentiable Datalog interpreter NTP. gNTPs ameliorate the large computational cost of recursive differentiable interpretation, achieving drastic time and memory speedups while introducing soft reasoning over logic knowledge and natural language

    Towards Fast Computation of Certified Robustness for ReLU Networks

    Full text link
    Verifying the robustness property of a general Rectified Linear Unit (ReLU) network is an NP-complete problem [Katz, Barrett, Dill, Julian and Kochenderfer CAV17]. Although finding the exact minimum adversarial distortion is hard, giving a certified lower bound of the minimum distortion is possible. Current available methods of computing such a bound are either time-consuming or delivering low quality bounds that are too loose to be useful. In this paper, we exploit the special structure of ReLU networks and provide two computationally efficient algorithms Fast-Lin and Fast-Lip that are able to certify non-trivial lower bounds of minimum distortions, by bounding the ReLU units with appropriate linear functions Fast-Lin, or by bounding the local Lipschitz constant Fast-Lip. Experiments show that (1) our proposed methods deliver bounds close to (the gap is 2-3X) exact minimum distortion found by Reluplex in small MNIST networks while our algorithms are more than 10,000 times faster; (2) our methods deliver similar quality of bounds (the gap is within 35% and usually around 10%; sometimes our bounds are even better) for larger networks compared to the methods based on solving linear programming problems but our algorithms are 33-14,000 times faster; (3) our method is capable of solving large MNIST and CIFAR networks up to 7 layers with more than 10,000 neurons within tens of seconds on a single CPU core. In addition, we show that, in fact, there is no polynomial time algorithm that can approximately find the minimum 1\ell_1 adversarial distortion of a ReLU network with a 0.99lnn0.99\ln n approximation ratio unless NP\mathsf{NP}=P\mathsf{P}, where nn is the number of neurons in the network.Comment: Tsui-Wei Weng and Huan Zhang contributed equall