129,491 research outputs found
A Theory of Formal Synthesis via Inductive Learning
Formal synthesis is the process of generating a program satisfying a
high-level formal specification. In recent times, effective formal synthesis
methods have been proposed based on the use of inductive learning. We refer to
this class of methods that learn programs from examples as formal inductive
synthesis. In this paper, we present a theoretical framework for formal
inductive synthesis. We discuss how formal inductive synthesis differs from
traditional machine learning. We then describe oracle-guided inductive
synthesis (OGIS), a framework that captures a family of synthesizers that
operate by iteratively querying an oracle. An instance of OGIS that has had
much practical impact is counterexample-guided inductive synthesis (CEGIS). We
present a theoretical characterization of CEGIS for learning any program that
computes a recursive language. In particular, we analyze the relative power of
CEGIS variants where the types of counterexamples generated by the oracle
varies. We also consider the impact of bounded versus unbounded memory
available to the learning algorithm. In the special case where the universe of
candidate programs is finite, we relate the speed of convergence to the notion
of teaching dimension studied in machine learning theory. Altogether, the
results of the paper take a first step towards a theoretical foundation for the
emerging field of formal inductive synthesis
A bagging SVM to learn from positive and unlabeled examples
We consider the problem of learning a binary classifier from a training set
of positive and unlabeled examples, both in the inductive and in the
transductive setting. This problem, often referred to as \emph{PU learning},
differs from the standard supervised classification problem by the lack of
negative examples in the training set. It corresponds to an ubiquitous
situation in many applications such as information retrieval or gene ranking,
when we have identified a set of data of interest sharing a particular
property, and we wish to automatically retrieve additional data sharing the
same property among a large and easily available pool of unlabeled data. We
propose a conceptually simple method, akin to bagging, to approach both
inductive and transductive PU learning problems, by converting them into series
of supervised binary classification problems discriminating the known positive
examples from random subsamples of the unlabeled set. We empirically
demonstrate the relevance of the method on simulated and real data, where it
performs at least as well as existing methods while being faster
Sciduction: Combining Induction, Deduction, and Structure for Verification and Synthesis
Even with impressive advances in automated formal methods, certain problems
in system verification and synthesis remain challenging. Examples include the
verification of quantitative properties of software involving constraints on
timing and energy consumption, and the automatic synthesis of systems from
specifications. The major challenges include environment modeling,
incompleteness in specifications, and the complexity of underlying decision
problems.
This position paper proposes sciduction, an approach to tackle these
challenges by integrating inductive inference, deductive reasoning, and
structure hypotheses. Deductive reasoning, which leads from general rules or
concepts to conclusions about specific problem instances, includes techniques
such as logical inference and constraint solving. Inductive inference, which
generalizes from specific instances to yield a concept, includes algorithmic
learning from examples. Structure hypotheses are used to define the class of
artifacts, such as invariants or program fragments, generated during
verification or synthesis. Sciduction constrains inductive and deductive
reasoning using structure hypotheses, and actively combines inductive and
deductive reasoning: for instance, deductive techniques generate examples for
learning, and inductive reasoning is used to guide the deductive engines.
We illustrate this approach with three applications: (i) timing analysis of
software; (ii) synthesis of loop-free programs, and (iii) controller synthesis
for hybrid systems. Some future applications are also discussed
Auto-Encoding Scene Graphs for Image Captioning
We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language
inductive bias into the encoder-decoder image captioning framework for more
human-like captions. Intuitively, we humans use the inductive bias to compose
collocations and contextual inference in discourse. For example, when we see
the relation `person on bike', it is natural to replace `on' with `ride' and
infer `person riding bike on a road' even the `road' is not evident. Therefore,
exploiting such bias as a language prior is expected to help the conventional
encoder-decoder models less likely overfit to the dataset bias and focus on
reasoning. Specifically, we use the scene graph --- a directed graph
() where an object node is connected by adjective nodes and
relationship nodes --- to represent the complex structural layout of both image
() and sentence (). In the textual domain, we use
SGAE to learn a dictionary () that helps to reconstruct sentences
in the pipeline, where encodes the desired language prior;
in the vision-language domain, we use the shared to guide the
encoder-decoder in the pipeline. Thanks to the scene graph
representation and shared dictionary, the inductive bias is transferred across
domains in principle. We validate the effectiveness of SGAE on the challenging
MS-COCO image captioning benchmark, e.g., our SGAE-based single-model achieves
a new state-of-the-art CIDEr-D on the Karpathy split, and a competitive
CIDEr-D (c40) on the official server even compared to other ensemble
models
Becoming the Expert - Interactive Multi-Class Machine Teaching
Compared to machines, humans are extremely good at classifying images into
categories, especially when they possess prior knowledge of the categories at
hand. If this prior information is not available, supervision in the form of
teaching images is required. To learn categories more quickly, people should
see important and representative images first, followed by less important
images later - or not at all. However, image-importance is individual-specific,
i.e. a teaching image is important to a student if it changes their overall
ability to discriminate between classes. Further, students keep learning, so
while image-importance depends on their current knowledge, it also varies with
time.
In this work we propose an Interactive Machine Teaching algorithm that
enables a computer to teach challenging visual concepts to a human. Our
adaptive algorithm chooses, online, which labeled images from a teaching set
should be shown to the student as they learn. We show that a teaching strategy
that probabilistically models the student's ability and progress, based on
their correct and incorrect answers, produces better 'experts'. We present
results using real human participants across several varied and challenging
real-world datasets.Comment: CVPR 201
Universal Language Model Fine-tuning for Text Classification
Inductive transfer learning has greatly impacted computer vision, but
existing approaches in NLP still require task-specific modifications and
training from scratch. We propose Universal Language Model Fine-tuning
(ULMFiT), an effective transfer learning method that can be applied to any task
in NLP, and introduce techniques that are key for fine-tuning a language model.
Our method significantly outperforms the state-of-the-art on six text
classification tasks, reducing the error by 18-24% on the majority of datasets.
Furthermore, with only 100 labeled examples, it matches the performance of
training from scratch on 100x more data. We open-source our pretrained models
and code.Comment: ACL 2018, fixed denominator in Equation 3, line
What can generic neural networks learn from a child's visual experience?
Young children develop sophisticated internal models of the world based on
their egocentric visual experience. How much of this is driven by innate
constraints and how much is driven by their experience? To investigate these
questions, we train state-of-the-art neural networks on a realistic proxy of a
child's visual experience without any explicit supervision or domain-specific
inductive biases. Specifically, we train both embedding models and generative
models on 200 hours of headcam video from a single child collected over two
years. We train a total of 72 different models, exploring a range of model
architectures and self-supervised learning algorithms, and comprehensively
evaluate their performance in downstream tasks. The best embedding models
perform at 70% of a highly performant ImageNet-trained model on average. They
also learn broad semantic categories without any labeled examples and learn to
localize semantic categories in an image without any location supervision.
However, these models are less object-centric and more background-sensitive
than comparable ImageNet-trained models. Generative models trained with the
same data successfully extrapolate simple properties of partially masked
objects, such as their texture, color, orientation, and rough outline, but
struggle with finer object details. We replicate our experiments with two other
children and find very similar results. Broadly useful high-level visual
representations are thus robustly learnable from a representative sample of a
child's visual experience without strong inductive biases.Comment: 26 pages, 14 figures, 3 tables; code & all pretrained models
available from https://github.com/eminorhan/silicon-menageri
Inductive learning of answer set programs
The goal of Inductive Logic Programming (ILP) is to find a hypothesis that
explains a set of examples in the context of some pre-existing background
knowledge. Until recently, most research on ILP targeted learning definite
logic programs. This thesis constitutes the first comprehensive work on
learning answer set programs, introducing new learning frameworks, theoretical
results on the complexity and generality of these frameworks, algorithms for
learning ASP programs, and an extensive evaluation of these algorithms.
Although there is previous work on learning ASP programs, existing learning
frameworks are either brave -- where examples should be explained by at
least one answer set -- or cautious where examples should be explained
by all answer sets. There are cases where brave induction is too weak and
cautious induction is too strong. Our proposed frameworks combine brave and
cautious learning and can learn ASP programs containing choice rules and
constraints. Many applications of ASP use weak constraints to express a
preference ordering over the answer sets of a program. Learning weak
constraints corresponds to preference learning, which we achieve by
introducing ordering examples. We then explore the generality of our
frameworks, investigating what it means for a framework to be general enough to
distinguish one hypothesis from another. We show that our frameworks are more
general than both brave and cautious induction.
We also present a new family of algorithms, called ILASP (Inductive Learning of
Answer Set Programs), which we prove to be sound and complete. This work
concerns learning from both non-noisy and noisy examples. In the latter case,
ILASP returns a hypothesis that maximises the coverage of examples while
minimising the length of the hypothesis. In our evaluation, we show that ILASP
scales to tasks with large numbers of examples finding accurate hypotheses
even in the presence of high proportions of noisy examples.Open Acces
Learning programs by learning from failures
We describe an inductive logic programming (ILP) approach called learning
from failures. In this approach, an ILP system (the learner) decomposes the
learning problem into three separate stages: generate, test, and constrain. In
the generate stage, the learner generates a hypothesis (a logic program) that
satisfies a set of hypothesis constraints (constraints on the syntactic form of
hypotheses). In the test stage, the learner tests the hypothesis against
training examples. A hypothesis fails when it does not entail all the positive
examples or entails a negative example. If a hypothesis fails, then, in the
constrain stage, the learner learns constraints from the failed hypothesis to
prune the hypothesis space, i.e. to constrain subsequent hypothesis generation.
For instance, if a hypothesis is too general (entails a negative example), the
constraints prune generalisations of the hypothesis. If a hypothesis is too
specific (does not entail all the positive examples), the constraints prune
specialisations of the hypothesis. This loop repeats until either (i) the
learner finds a hypothesis that entails all the positive and none of the
negative examples, or (ii) there are no more hypotheses to test. We introduce
Popper, an ILP system that implements this approach by combining answer set
programming and Prolog. Popper supports infinite problem domains, reasoning
about lists and numbers, learning textually minimal programs, and learning
recursive programs. Our experimental results on three domains (toy game
problems, robot strategies, and list transformations) show that (i) constraints
drastically improve learning performance, and (ii) Popper can outperform
existing ILP systems, both in terms of predictive accuracies and learning
times.Comment: Accepted for the machine learning journa
- …