166,831 research outputs found
The Regular Expression Inference Challenge
We propose \emph{regular expression inference (REI)} as a challenge for
code/language modelling, and the wider machine learning community. REI is a
supervised machine learning (ML) and program synthesis task, and poses the
problem of finding minimal regular expressions from examples: Given two finite
sets of strings and and a cost function , the task
is to generate an expression that accepts all strings in and rejects
all strings in , while no other such expression exists with
.
REI has advantages as a challenge problem: (i) regular expressions are
well-known, widely used, and a natural idealisation of code; (ii) REI's
asymptotic worst-case complexity is well understood; (iii) REI has a small
number of easy to understand parameters (e.g.~ or cardinality, string
lengths of examples, or the cost function); this lets us easily finetune
REI-hardness; (iv) REI is an unsolved problem for deep learning based ML.
Recently, an REI solver was implemented on GPUs, using program synthesis
techniques. This enabled, for the first time, fast generation of minimal
expressions for complex REI instances. Building on this advance, we generate
and publish the first large-scale datasets for REI, and devise and evaluate
several initial heuristic and machine learning baselines.
We invite the community to participate and explore ML methods that learn to
solve REI problems. We believe that progress in REI directly translates to
code/language modelling.Comment: 7 pages, 3 pages appendix, 6 table
Learning Concise Models from Long Execution Traces
Abstract models of system-level behaviour have applications in design
exploration, analysis, testing and verification. We describe a new algorithm
for automatically extracting useful models, as automata, from execution traces
of a HW/SW system driven by software exercising a use-case of interest. Our
algorithm leverages modern program synthesis techniques to generate predicates
on automaton edges, succinctly describing system behaviour. It employs trace
segmentation to tackle complexity for long traces. We learn concise models
capturing transaction-level, system-wide behaviour--experimentally
demonstrating the approach using traces from a variety of sources, including
the x86 QEMU virtual platform and the Real-Time Linux kernel
Abstract Learning Frameworks for Synthesis
We develop abstract learning frameworks (ALFs) for synthesis that embody the
principles of CEGIS (counter-example based inductive synthesis) strategies that
have become widely applicable in recent years. Our framework defines a general
abstract framework of iterative learning, based on a hypothesis space that
captures the synthesized objects, a sample space that forms the space on which
induction is performed, and a concept space that abstractly defines the
semantics of the learning process. We show that a variety of synthesis
algorithms in current literature can be embedded in this general framework.
While studying these embeddings, we also generalize some of the synthesis
problems these instances are of, resulting in new ways of looking at synthesis
problems using learning. We also investigate convergence issues for the general
framework, and exhibit three recipes for convergence in finite time. The first
two recipes generalize current techniques for convergence used by existing
synthesis engines. The third technique is a more involved technique of which we
know of no existing instantiation, and we instantiate it to concrete synthesis
problems
- …