1,570 research outputs found
Probabilistic Reasoning as Information Compression by Multiple Alignment, Unification and Search: An Introduction and Overview
This article introduces the idea that probabilistic reasoning (PR) may be
understood as "information compression by multiple alignment, unification and
search" (ICMAUS). In this context, multiple alignment has a meaning which is
similar to but distinct from its meaning in bio-informatics, while unification
means a simple merging of matching patterns, a meaning which is related to but
simpler than the meaning of that term in logic.
A software model, SP61, has been developed for the discovery and formation of
'good' multiple alignments, evaluated in terms of information compression. The
model is described in outline.
Using examples from the SP61 model, this article describes in outline how the
ICMAUS framework can model various kinds of PR including: PR in best-match
pattern recognition and information retrieval; one-step 'deductive' and
'abductive' PR; inheritance of attributes in a class hierarchy; chains of
reasoning (probabilistic decision networks and decision trees, and PR with
'rules'); geometric analogy problems; nonmonotonic reasoning and reasoning with
default values; modelling the function of a Bayesian network
Spectral Sparse Representation for Clustering: Evolved from PCA, K-means, Laplacian Eigenmap, and Ratio Cut
Dimensionality reduction, cluster analysis, and sparse representation are
basic components in machine learning. However, their relationships have not yet
been fully investigated. In this paper, we find that the spectral graph theory
underlies a series of these elementary methods and can unify them into a
complete framework. The methods include PCA, K-means, Laplacian eigenmap (LE),
ratio cut (Rcut), and a new sparse representation method developed by us,
called spectral sparse representation (SSR). Further, extended relations to
conventional over-complete sparse representations (e.g., method of optimal
directions, KSVD), manifold learning (e.g., kernel PCA, multidimensional
scaling, Isomap, locally linear embedding), and subspace clustering (e.g.,
sparse subspace clustering, low-rank representation) are incorporated. We show
that, under an ideal condition from the spectral graph theory, PCA, K-means,
LE, and Rcut are unified together. And when the condition is relaxed, the
unification evolves to SSR, which lies in the intermediate between PCA/LE and
K-mean/Rcut. An efficient algorithm, NSCrt, is developed to solve the sparse
codes of SSR. SSR combines merits of both sides: its sparse codes reduce
dimensionality of data meanwhile revealing cluster structure. For its inherent
relation to cluster analysis, the codes of SSR can be directly used for
clustering. Scut, a clustering approach derived from SSR reaches the
state-of-the-art performance in the spectral clustering family. The one-shot
solution obtained by Scut is comparable to the optimal result of K-means that
are run many times. Experiments on various data sets demonstrate the properties
and strengths of SSR, NSCrt, and Scut
Parameter and Insertion Function Co-synthesis for Opacity Enhancement in Parametric Stochastic Discrete Event Systems
Opacity is a property that characterizes the system's capability to keep its
"secret" from being inferred by an intruder that partially observes the
system's behavior. In this paper, we are concerned with enhancing the opacity
using insertion functions, while at the same time, enforcing the task
specification in a parametric stochastic discrete event system. We first obtain
the parametric Markov decision process that encodes all the possible
insertions. Based on which, we convert this parameter and insertion function
co-synthesis problem into a nonlinear program. We prove that if the output of
this program satisfies all the constraints, it will be a valid solution to our
problem. Therefore, the security and the capability of enforcing the task
specification can be simultaneously guaranteed
Sequence to Logic with Copy and Cache
Generating logical form equivalents of human language is a fresh way to
employ neural architectures where long short-term memory effectively captures
dependencies in both encoder and decoder units.
The logical form of the sequence usually preserves information from the
natural language side in the form of similar tokens, and recently a copying
mechanism has been proposed which increases the probability of outputting
tokens from the source input through decoding.
In this paper we propose a caching mechanism as a more general form of the
copying mechanism which also weighs all the words from the source vocabulary
according to their relation to the current decoding context.
Our results confirm that the proposed method achieves improvements in
sequence/token-level accuracy on sequence to logical form tasks. Further
experiments on cross-domain adversarial attacks show substantial improvements
when using the most influential examples of other domains for training
On Differentiable Interpreters
Neural networks have transformed the fields of Machine Learning and Artificial Intelligence with the ability to model complex features and behaviours from raw data. They quickly became instrumental models, achieving numerous state-of-the-art performances across many tasks and domains. Yet the successes of these models often rely on large amounts of data. When data is scarce, resourceful ways of using background knowledge often help. However, though different types of background knowledge can be used to bias the model, it is not clear how one can use algorithmic knowledge to that extent. In this thesis, we present differentiable interpreters as an effective framework for utilising algorithmic background knowledge as architectural inductive biases of neural networks. By continuously approximating discrete elements of traditional program interpreters, we create differentiable interpreters that, due to the continuous nature of their execution, are amenable to optimisation with gradient descent methods. This enables us to write code mixed with parametric functions, where the code strongly biases the behaviour of the model while enabling the training of parameters and/or input representations from data. We investigate two such differentiable interpreters and their use cases in this thesis. First, we present a detailed construction of ∂4, a differentiable interpreter for the programming language FORTH. We demonstrate the ability of ∂4 to strongly bias neural models with incomplete programs of variable complexity while learning missing pieces of the program with parametrised neural networks. Such models can learn to solve tasks and strongly generalise to out-of-distribution data from small datasets. Second, we present greedy Neural Theorem Provers (gNTPs), a significant improvement of a differentiable Datalog interpreter NTP. gNTPs ameliorate the large computational cost of recursive differentiable interpretation, achieving drastic time and memory speedups while introducing soft reasoning over logic knowledge and natural language
AI Reasoning Systems: PAC and Applied Methods
Learning and logic are distinct and remarkable approaches to prediction.
Machine learning has experienced a surge in popularity because it is robust to
noise and achieves high performance; however, ML experiences many issues with
knowledge transfer and extrapolation. In contrast, logic is easily intepreted,
and logical rules are easy to chain and transfer between systems; however,
inductive logic is brittle to noise. We then explore the premise of combining
learning with inductive logic into AI Reasoning Systems. Specifically, we
summarize findings from PAC learning (conceptual graphs, robust logics,
knowledge infusion) and deep learning (DSRL, ILP, DeepLogic) by
reproducing proofs of tractability, presenting algorithms in pseudocode,
highlighting results, and synthesizing between fields. We conclude with
suggestions for integrated models by combining the modules listed above and
with a list of unsolved (likely intractable) problems.Comment: 26 page
- …