Search CORE

166,831 research outputs found

The Regular Expression Inference Challenge

Author: Berger Martin
Gorinski Philip John
Iacobacci Ignacio
Valizadeh Mojtaba
Publication venue
Publication date: 15/08/2023
Field of study

We propose \emph{regular expression inference (REI)} as a challenge for code/language modelling, and the wider machine learning community. REI is a supervised machine learning (ML) and program synthesis task, and poses the problem of finding minimal regular expressions from examples: Given two finite sets of strings

P

and

N

and a cost function

\text{cost}(\cdot)

, the task is to generate an expression

r

that accepts all strings in

P

and rejects all strings in

N

, while no other such expression

r'

exists with

\text{cost}(r')<\text{cost}(r)

. REI has advantages as a challenge problem: (i) regular expressions are well-known, widely used, and a natural idealisation of code; (ii) REI's asymptotic worst-case complexity is well understood; (iii) REI has a small number of easy to understand parameters (e.g.~

P

N

cardinality, string lengths of examples, or the cost function); this lets us easily finetune REI-hardness; (iv) REI is an unsolved problem for deep learning based ML. Recently, an REI solver was implemented on GPUs, using program synthesis techniques. This enabled, for the first time, fast generation of minimal expressions for complex REI instances. Building on this advance, we generate and publish the first large-scale datasets for REI, and devise and evaluate several initial heuristic and machine learning baselines. We invite the community to participate and explore ML methods that learn to solve REI problems. We believe that progress in REI directly translates to code/language modelling.Comment: 7 pages, 3 pages appendix, 6 table

arXiv.org e-Print Archive

Learning Concise Models from Long Execution Traces

Author: Jeppu Natasha Yogananda
Kroening Daniel
Melham Tom
O'Leary John
Publication venue
Publication date: 01/01/2020
Field of study

Abstract models of system-level behaviour have applications in design exploration, analysis, testing and verification. We describe a new algorithm for automatically extracting useful models, as automata, from execution traces of a HW/SW system driven by software exercising a use-case of interest. Our algorithm leverages modern program synthesis techniques to generate predicates on automaton edges, succinctly describing system behaviour. It employs trace segmentation to tackle complexity for long traces. We learn concise models capturing transaction-level, system-wide behaviour--experimentally demonstrating the approach using traces from a variety of sources, including the x86 QEMU virtual platform and the Real-Time Linux kernel

arXiv.org e-Print Archive

Oxford University Research Archive

Abstract Learning Frameworks for Synthesis

Author: A Blum
A Cheung
A Lal
A Vardhan
Armando Solar-Lezama
C Flanagan
E Kitzelmann
E Kneuss
G Higman
M Barnett
MJ Kearns
P Garg
P Garg
P Černý
PM Domingos
R Sharma
S Saha
V Kuncak
Z Manna
Publication venue
Publication date: 01/01/2016
Field of study

We develop abstract learning frameworks (ALFs) for synthesis that embody the principles of CEGIS (counter-example based inductive synthesis) strategies that have become widely applicable in recent years. Our framework defines a general abstract framework of iterative learning, based on a hypothesis space that captures the synthesized objects, a sample space that forms the space on which induction is performed, and a concept space that abstractly defines the semantics of the learning process. We show that a variety of synthesis algorithms in current literature can be embedded in this general framework. While studying these embeddings, we also generalize some of the synthesis problems these instances are of, resulting in new ways of looking at synthesis problems using learning. We also investigate convergence issues for the general framework, and exhibit three recipes for convergence in finite time. The first two recipes generalize current techniques for convergence used by existing synthesis engines. The third technique is a more involved technique of which we know of no existing instantiation, and we instantiate it to concrete synthesis problems

arXiv.org e-Print Archive