407 research outputs found
Fast and correct variational inference for probabilistic programming: Differentiability, reparameterisation and smoothing
Probabilistic programming is an innovative programming paradigm for posing and automatically solving Bayesian inference problems. In this thesis, we study the foundations of fast yet correct inference for probabilistic programming.
Many of the most successful inference techniques (e.g. Hamiltonian Monte Carlo or Stochastic Variational Inference) harness gradients of the so-called density function, which therefore needs to be differentiable at least almost everywhere. We resolve a question posed by Hongseok Yang by demonstrating the following: densities of almost surely terminating programs are differentiable almost everywhere.
Having established this property necessary for the correctness of gradient-based inference algorithms, we investigate variational inference, which frames posterior inference as an optimisation problem, in more detail. The dominant approach for stochastic optimisation in practice is stochastic gradient descent. In particular, a variant using the so-called reparameterisation gradient estimator exhibits low variance, resulting in fast convergence in a traditional statistics setting. Unfortunately, although having measure 0, discontinuities can compromise the correctness of this approach.
Therefore, we propose a smoothed interpretation parameterised by an accuracy coefficient and present type systems establishing technical pre-conditions. Thus, we can prove stochastic gradient descent with the reparameterisation gradient estimator to be correct when applied to the smoothed problem. Besides, via a uniform convergence result, we can solve the original problem up to any error tolerance by choosing an accuracy coefficient suitably.
Furthermore, rather than fixing an accuracy coefficient in advance (limiting the quality of the final solution), we propose a novel variant of stochastic gradient descent, Diagonalisation Stochastic Gradient Descent, which progressively enhances the accuracy of the smoothed approximation during optimisation, and we prove convergence to stationary points of the unsmoothed (original) objective.
An empirical evaluation reveals benefits of our approaches over the state of the art: our approaches are simple, fast and attain orders of magnitude reduction in work- normalised variance. Besides, Diagonalisation Stochastic Gradient Descent is more stable than standard stochastic gradient descent for a fixed-accuracy smoothing.
Finally, we show unbiasedness of the reparameterisation gradient estimator for continuous but non-differentiable models, and we propose a method based on higher-order logic to establish continuity in the presence of conditionals. We provide a sound and complete reduction for verifying continuity of models to a satisfiability problem, and we propose novel efficient randomised decision procedures
Learning with Logical Constraints but without Shortcut Satisfaction
Recent studies in neuro-symbolic learning have explored the integration of
logical knowledge into deep learning via encoding logical constraints as an
additional loss function. However, existing approaches tend to vacuously
satisfy logical constraints through shortcuts, failing to fully exploit the
knowledge. In this paper, we present a new framework for learning with logical
constraints. Specifically, we address the shortcut satisfaction issue by
introducing dual variables for logical connectives, encoding how the constraint
is satisfied. We further propose a variational framework where the encoded
logical constraint is expressed as a distributional loss that is compatible
with the model's original training loss. The theoretical analysis shows that
the proposed approach bears salient properties, and the experimental
evaluations demonstrate its superior performance in both model generalizability
and constraint satisfaction.Comment: Published as a conference paper at ICLR 2023, and code is available
at https://github.com/SoftWiser-group/NeSy-without-Shortcut
Graph Neural Networks Meet Neural-Symbolic Computing: A Survey and Perspective
Neural-symbolic computing has now become the subject of interest of both
academic and industry research laboratories. Graph Neural Networks (GNN) have
been widely used in relational and symbolic domains, with widespread
application of GNNs in combinatorial optimization, constraint satisfaction,
relational reasoning and other scientific domains. The need for improved
explainability, interpretability and trust of AI systems in general demands
principled methodologies, as suggested by neural-symbolic computing. In this
paper, we review the state-of-the-art on the use of GNNs as a model of
neural-symbolic computing. This includes the application of GNNs in several
domains as well as its relationship to current developments in neural-symbolic
computing.Comment: Updated version, draft of accepted IJCAI2020 Survey Pape
A Formal Methods Approach to Pattern Synthesis in Reaction Diffusion Systems
We propose a technique to detect and generate patterns in a network of
locally interacting dynamical systems. Central to our approach is a novel
spatial superposition logic, whose semantics is defined over the quad-tree of a
partitioned image. We show that formulas in this logic can be efficiently
learned from positive and negative examples of several types of patterns. We
also demonstrate that pattern detection, which is implemented as a model
checking algorithm, performs very well for test data sets different from the
learning sets. We define a quantitative semantics for the logic and integrate
the model checking algorithm with particle swarm optimization in a
computational framework for synthesis of parameters leading to desired patterns
in reaction-diffusion systems
Scallop: A Language for Neurosymbolic Programming
We present Scallop, a language which combines the benefits of deep learning
and logical reasoning. Scallop enables users to write a wide range of
neurosymbolic applications and train them in a data- and compute-efficient
manner. It achieves these goals through three key features: 1) a flexible
symbolic representation that is based on the relational data model; 2) a
declarative logic programming language that is based on Datalog and supports
recursion, aggregation, and negation; and 3) a framework for automatic and
efficient differentiable reasoning that is based on the theory of provenance
semirings. We evaluate Scallop on a suite of eight neurosymbolic applications
from the literature. Our evaluation demonstrates that Scallop is capable of
expressing algorithmic reasoning in diverse and challenging AI tasks, provides
a succinct interface for machine learning programmers to integrate logical
domain knowledge, and yields solutions that are comparable or superior to
state-of-the-art models in terms of accuracy. Furthermore, Scallop's solutions
outperform these models in aspects such as runtime and data efficiency,
interpretability, and generalizability
On Differentiable Interpreters
Neural networks have transformed the fields of Machine Learning and Artificial Intelligence with the ability to model complex features and behaviours from raw data. They quickly became instrumental models, achieving numerous state-of-the-art performances across many tasks and domains. Yet the successes of these models often rely on large amounts of data. When data is scarce, resourceful ways of using background knowledge often help. However, though different types of background knowledge can be used to bias the model, it is not clear how one can use algorithmic knowledge to that extent. In this thesis, we present differentiable interpreters as an effective framework for utilising algorithmic background knowledge as architectural inductive biases of neural networks. By continuously approximating discrete elements of traditional program interpreters, we create differentiable interpreters that, due to the continuous nature of their execution, are amenable to optimisation with gradient descent methods. This enables us to write code mixed with parametric functions, where the code strongly biases the behaviour of the model while enabling the training of parameters and/or input representations from data. We investigate two such differentiable interpreters and their use cases in this thesis. First, we present a detailed construction of ∂4, a differentiable interpreter for the programming language FORTH. We demonstrate the ability of ∂4 to strongly bias neural models with incomplete programs of variable complexity while learning missing pieces of the program with parametrised neural networks. Such models can learn to solve tasks and strongly generalise to out-of-distribution data from small datasets. Second, we present greedy Neural Theorem Provers (gNTPs), a significant improvement of a differentiable Datalog interpreter NTP. gNTPs ameliorate the large computational cost of recursive differentiable interpretation, achieving drastic time and memory speedups while introducing soft reasoning over logic knowledge and natural language
Towards Fast Computation of Certified Robustness for ReLU Networks
Verifying the robustness property of a general Rectified Linear Unit (ReLU)
network is an NP-complete problem [Katz, Barrett, Dill, Julian and Kochenderfer
CAV17]. Although finding the exact minimum adversarial distortion is hard,
giving a certified lower bound of the minimum distortion is possible. Current
available methods of computing such a bound are either time-consuming or
delivering low quality bounds that are too loose to be useful. In this paper,
we exploit the special structure of ReLU networks and provide two
computationally efficient algorithms Fast-Lin and Fast-Lip that are able to
certify non-trivial lower bounds of minimum distortions, by bounding the ReLU
units with appropriate linear functions Fast-Lin, or by bounding the local
Lipschitz constant Fast-Lip. Experiments show that (1) our proposed methods
deliver bounds close to (the gap is 2-3X) exact minimum distortion found by
Reluplex in small MNIST networks while our algorithms are more than 10,000
times faster; (2) our methods deliver similar quality of bounds (the gap is
within 35% and usually around 10%; sometimes our bounds are even better) for
larger networks compared to the methods based on solving linear programming
problems but our algorithms are 33-14,000 times faster; (3) our method is
capable of solving large MNIST and CIFAR networks up to 7 layers with more than
10,000 neurons within tens of seconds on a single CPU core.
In addition, we show that, in fact, there is no polynomial time algorithm
that can approximately find the minimum adversarial distortion of a
ReLU network with a approximation ratio unless
=, where is the number of neurons in the network.Comment: Tsui-Wei Weng and Huan Zhang contributed equall
- …