51 research outputs found
Overfitting in Synthesis: Theory and Practice (Extended Version)
In syntax-guided synthesis (SyGuS), a synthesizer's goal is to automatically
generate a program belonging to a grammar of possible implementations that
meets a logical specification. We investigate a common limitation across
state-of-the-art SyGuS tools that perform counterexample-guided inductive
synthesis (CEGIS). We empirically observe that as the expressiveness of the
provided grammar increases, the performance of these tools degrades
significantly.
We claim that this degradation is not only due to a larger search space, but
also due to overfitting. We formally define this phenomenon and prove
no-free-lunch theorems for SyGuS, which reveal a fundamental tradeoff between
synthesizer performance and grammar expressiveness.
A standard approach to mitigate overfitting in machine learning is to run
multiple learners with varying expressiveness in parallel. We demonstrate that
this insight can immediately benefit existing SyGuS tools. We also propose a
novel single-threaded technique called hybrid enumeration that interleaves
different grammars and outperforms the winner of the 2018 SyGuS competition
(Inv track), solving more problems and achieving a mean speedup.Comment: 24 pages (5 pages of appendices), 7 figures, includes proofs of
theorem
On Counterexample Guided Quantifier Instantiation for Synthesis in CVC4
We introduce the first program synthesis engine implemented inside an SMT
solver. We present an approach that extracts solution functions from
unsatisfiability proofs of the negated form of synthesis conjectures. We also
discuss novel counterexample-guided techniques for quantifier instantiation
that we use to make finding such proofs practically feasible. A particularly
important class of specifications are single-invocation properties, for which
we present a dedicated algorithm. To support syntax restrictions on generated
solutions, our approach can transform a solution found without restrictions
into the desired syntactic form. As an alternative, we show how to use
evaluation function axioms to embed syntactic restrictions into constraints
over algebraic datatypes, and then use an algebraic datatype decision procedure
to drive synthesis. Our experimental evaluation on syntax-guided synthesis
benchmarks shows that our implementation in the CVC4 SMT solver is competitive
with state-of-the-art tools for synthesis
Abstract Learning Frameworks for Synthesis
We develop abstract learning frameworks (ALFs) for synthesis that embody the
principles of CEGIS (counter-example based inductive synthesis) strategies that
have become widely applicable in recent years. Our framework defines a general
abstract framework of iterative learning, based on a hypothesis space that
captures the synthesized objects, a sample space that forms the space on which
induction is performed, and a concept space that abstractly defines the
semantics of the learning process. We show that a variety of synthesis
algorithms in current literature can be embedded in this general framework.
While studying these embeddings, we also generalize some of the synthesis
problems these instances are of, resulting in new ways of looking at synthesis
problems using learning. We also investigate convergence issues for the general
framework, and exhibit three recipes for convergence in finite time. The first
two recipes generalize current techniques for convergence used by existing
synthesis engines. The third technique is a more involved technique of which we
know of no existing instantiation, and we instantiate it to concrete synthesis
problems
Grammar Filtering For Syntax-Guided Synthesis
Programming-by-example (PBE) is a synthesis paradigm that allows users to
generate functions by simply providing input-output examples. While a promising
interaction paradigm, synthesis is still too slow for realtime interaction and
more widespread adoption. Existing approaches to PBE synthesis have used
automated reasoning tools, such as SMT solvers, as well as works applying
machine learning techniques. At its core, the automated reasoning approach
relies on highly domain specific knowledge of programming languages. On the
other hand, the machine learning approaches utilize the fact that when working
with program code, it is possible to generate arbitrarily large training
datasets. In this work, we propose a system for using machine learning in
tandem with automated reasoning techniques to solve Syntax Guided Synthesis
(SyGuS) style PBE problems. By preprocessing SyGuS PBE problems with a neural
network, we can use a data driven approach to reduce the size of the search
space, then allow automated reasoning-based solvers to more quickly find a
solution analytically. Our system is able to run atop existing SyGuS PBE
synthesis tools, decreasing the runtime of the winner of the 2019 SyGuS
Competition for the PBE Strings track by 47.65% to outperform all of the
competing tools
Program Synthesis for Program Analysis
In this article, we propose a unified framework for designing static analysers based on program synthesis. For this purpose, we identify a fragment of second-order logic with restricted quantification that is expressive enough to model numerous static analysis problems (e.g., safety proving, bug finding, termination and non-termination proving, refactoring). As our focus is on programs that use bit-vectors, we build a decision procedure for this fragment over finite domains in the form of a program synthesiser. We provide instantiations of our framework for solving a diverse range of program verification tasks such as termination, non-termination, safety and bug finding, superoptimisation, and refactoring. Our experimental results show that our program synthesiser compares positively with specialised tools in each area as well as with general-purpose synthesisers
Automated Approaches for Program Verification and Repair
Formal methods techniques, such as verification, analysis, and synthesis,allow programmers to prove properties of their programs, or automatically derive programs from specifications. Making such techniques usable requires care: they must provide useful debugging information, be scalable, and enable automation. This dissertation presents automated analysis and synthesis techniques to ease the debugging of modular verification systems and allow easy access to constraint solvers from functional code. Further, it introduces machine learning based techniques to improve the scalability of off-the-shelf syntax-guided synthesis solvers and techniques to reduce the burden of network administrators writing and analyzing firewalls. We describe the design and implementationof a symbolic execution engine, G2, for non-strict functional languages such as Haskell. We extend G2 to both debug and automate the process of modular verification, and give Haskell programmers easy access to constraints solvers via a library named G2Q. Modular verifiers, such as LiquidHaskell, Dafny, and ESC/Java,allow programmers to write and prove specifications of their code. When a modular verifier fails to verify a program, it is not necessarily because of an actual bug in the program. This is because when verifying a function f, modular verifiers consider only the specification of a called function g, not the actual definition of g. Thus, a modular verifier may fail to prove a true specification of f if the specification of g is too weak. We present a technique, counterfactual symbolic execution, to aid in the debugging of modular verification failures. The approach uses symbolic execution to find concrete counterexamples, in the case of an actual inconsistency between a program and a specification; and abstract counterexamples, in the case that a function specification is too weak. Further, a counterexample-guided inductive synthesis (CEGIS) loop based technique is introduced to fully automate the process of modular verification, by using found counterexamples to automatically infer needed function specifications. The counterfactual symbolic execution and automated specification inference techniques are implemented in G2, and evaluated on existing LiquidHaskell errors and programs. We also leveraged G2 to build a library, G2Q, which allows writing constraint solving problemsdirectly as Haskell code. Users of G2Q can embed specially marked Haskell constraints (Boolean expressions) into their normal Haskell code, while marking some of the variables in the constraint as symbolic. Then, at runtime, G2Q automatically derives values for the symbolic variables that satisfy the constraint, and returns those values to the outside code. Unlike other constraint solving solutions, such as directly calling an SMT solver, G2Q uses symbolic execution to unroll recursive function definitions, and guarantees that the use of G2Q constraints will preserve type correctness. We further consider the problem of synthesizing functions viaa class of tools known as syntax-guided synthesis (SyGuS) solvers. We introduce a machine learning based technique to preprocess SyGuS problems, and reduce the space that the solver must search for a solution in. We demonstrate that the technique speeds up an existing SyGuS solver, CVC4, on a set of SyGuS solver benchmarks. Finally, we describe techniques to ease analysis and repair of firewalls.Firewalls are widely deployed to manage network security. However, firewall systems provide only a primitive interface, in which the specification is given as an ordered list of rules. This makes it hard to manually track and maintain the behavior of a firewall. We introduce a formal semantics for iptables firewall rules via a translation to first-order logic with uninterpreted functions and linear integer arithmetic, which allows encoding of firewalls into a decidable logic. We then describe techniques to automate the analysis and repair of firewalls using SMT solvers, based on user provided specifications of the desired behavior. We evaluate this approach with real world case studies collected from StackOverflow users
- …