71 research outputs found
Synthesis of Data Completion Scripts using Finite Tree Automata
In application domains that store data in a tabular format, a common task is
to fill the values of some cells using values stored in other cells. For
instance, such data completion tasks arise in the context of missing value
imputation in data science and derived data computation in spreadsheets and
relational databases. Unfortunately, end-users and data scientists typically
struggle with many data completion tasks that require non-trivial programming
expertise. This paper presents a synthesis technique for automating data
completion tasks using programming-by-example (PBE) and a very lightweight
sketching approach. Given a formula sketch (e.g., AVG(, )) and a few
input-output examples for each hole, our technique synthesizes a program to
automate the desired data completion task. Towards this goal, we propose a
domain-specific language (DSL) that combines spatial and relational reasoning
over tabular data and a novel synthesis algorithm that can generate DSL
programs that are consistent with the input-output examples. The key technical
novelty of our approach is a new version space learning algorithm that is based
on finite tree automata (FTA). The use of FTAs in the learning algorithm leads
to a more compact representation that allows more sharing between programs that
are consistent with the examples. We have implemented the proposed approach in
a tool called DACE and evaluate it on 84 benchmarks taken from online help
forums. We also illustrate the advantages of our approach by comparing our
technique against two existing synthesizers, namely PROSE and SKETCH
Program Synthesis using Abstraction Refinement
We present a new approach to example-guided program synthesis based on
counterexample-guided abstraction refinement. Our method uses the abstract
semantics of the underlying DSL to find a program whose abstract behavior
satisfies the examples. However, since program may be spurious with respect
to the concrete semantics, our approach iteratively refines the abstraction
until we either find a program that satisfies the examples or prove that no
such DSL program exists. Because many programs have the same input-output
behavior in terms of their abstract semantics, this synthesis methodology
significantly reduces the search space compared to existing techniques that use
purely concrete semantics. While synthesis using abstraction refinement
(SYNGAR) could be implemented in different settings, we propose a
refinement-based synthesis algorithm that uses abstract finite tree automata
(AFTA). Our technique uses a coarse initial program abstraction to construct an
initial AFTA, which is iteratively refined by constructing a proof of
incorrectness of any spurious program. In addition to ruling out the spurious
program accepted by the previous AFTA, proofs of incorrectness are also useful
for ruling out many other spurious programs. We implement these ideas in a
framework called \tool. We have used the BLAZE framework to build synthesizers
for string and matrix transformations, and we compare BLAZE with existing
techniques. Our results for the string domain show that BLAZE compares
favorably with FlashFill, a domain-specific synthesizer that is now deployed in
Microsoft PowerShell. In the context of matrix manipulations, we compare BLAZE
against Prose, a state-of-the-art general-purpose VSA-based synthesizer, and
show that BLAZE results in a 90x speed-up over Prose
Automated Migration of Hierarchical Data to Relational Tables using Programming-by-Example
While many applications export data in hierarchical formats like XML and
JSON, it is often necessary to convert such hierarchical documents to a
relational representation. This paper presents a novel programming-by-example
approach, and its implementation in a tool called Mitra, for automatically
migrating tree-structured documents to relational tables. We have evaluated the
proposed technique using two sets of experiments. In the first experiment, we
used Mitra to automate 98 data transformation tasks collected from
StackOverflow. Our method can generate the desired program for 94% of these
benchmarks with an average synthesis time of 3.8 seconds. In the second
experiment, we used Mitra to generate programs that can convert real-world XML
and JSON datasets to full-fledged relational databases. Our evaluation shows
that Mitra can automate the desired transformation for all datasets
Relational Program Synthesis
This paper proposes relational program synthesis, a new problem that concerns
synthesizing one or more programs that collectively satisfy a relational
specification. As a dual of relational program verification, relational program
synthesis is an important problem that has many practical applications, such as
automated program inversion and automatic generation of comparators. However,
this relational synthesis problem introduces new challenges over its
non-relational counterpart due to the combinatorially larger search space. As a
first step towards solving this problem, this paper presents a synthesis
technique that combines the counterexample-guided inductive synthesis framework
with a novel inductive synthesis algorithm that is based on relational version
space learning. We have implemented the proposed technique in a framework
called Relish, which can be instantiated to different application domains by
providing a suitable domain-specific language and the relevant relational
specification. We have used the Relish framework to build relational
synthesizers to automatically generate string encoders/decoders as well as
comparators, and we evaluate our tool on several benchmarks taken from prior
work and online forums. Our experimental results show that the proposed
technique can solve almost all of these benchmarks and that it significantly
outperforms EUSolver, a generic synthesis framework that won the general track
of the most recent SyGuS competition
LambdaNet: Probabilistic Type Inference using Graph Neural Networks
As gradual typing becomes increasingly popular in languages like Python and
TypeScript, there is a growing need to infer type annotations automatically.
While type annotations help with tasks like code completion and static error
catching, these annotations cannot be fully determined by compilers and are
tedious to annotate by hand. This paper proposes a probabilistic type inference
scheme for TypeScript based on a graph neural network. Our approach first uses
lightweight source code analysis to generate a program abstraction called a
type dependency graph, which links type variables with logical constraints as
well as name and usage information. Given this program abstraction, we then use
a graph neural network to propagate information between related type variables
and eventually make type predictions. Our neural architecture can predict both
standard types, like number or string, as well as user-defined types that have
not been encountered during training. Our experimental results show that our
approach outperforms prior work in this space by (absolute) on library
types, while having the ability to make type predictions that are out of scope
for existing techniques.Comment: Accepted as a poster at ICLR 202
Synthesizing Database Programs for Schema Refactoring
Many programs that interact with a database need to undergo schema
refactoring several times during their life cycle. Since this process typically
requires making significant changes to the program's implementation, schema
refactoring is often non-trivial and error-prone. Motivated by this problem, we
propose a new technique for automatically synthesizing a new version of a
database program given its original version and the source and target schemas.
Our method does not require manual user guidance and ensures that the
synthesized program is equivalent to the original one. Furthermore, our method
is quite efficient and can synthesize new versions of database programs
(containing up to 263 functions) that are extracted from real-world web
applications with an average synthesis time of 69.4 seconds
Failure-Directed Program Trimming (Extended Version)
This paper describes a new program simplification technique called program
trimming that aims to improve the scalability and precision of safety checking
tools. Given a program , program trimming generates a new program
such that and are equi-safe
(i.e., has a bug if and only if has a bug), but
has fewer execution paths than . Since many
program analyzers are sensitive to the number of execution paths, program
trimming has the potential to improve the effectiveness of safety checking
tools.
In addition to introducing the concept of program trimming, this paper also
presents a lightweight static analysis that can be used as a pre-processing
step to remove program paths while retaining equi-safety. We have implemented
the proposed technique in a tool called Trimmer and evaluate it in the context
of two program analysis techniques, namely abstract interpretation and dynamic
symbolic execution. Our experiments show that program trimming significantly
improves the effectiveness of both techniques
Symbolic Reasoning for Automatic Signal Placement (Extended Version)
Explicit signaling between threads is a perennial cause of bugs in concurrent
programs. While there are several run-time techniques to automatically notify
threads upon the availability of some shared resource, such techniques are not
widely-adopted due to their run-time overhead. This paper proposes a new
solution based on static analysis for automatically generating a performant
explicit-signal program from its corresponding implicit-signal implementation.
The key idea is to generate verification conditions that allow us to minimize
the number of required signals and unnecessary context switches, while
guaranteeing semantic equivalence between the source and target programs. We
have implemented our method in a tool called Expresso and evaluate it on
challenging benchmarks from prior papers and open-source software.
Expresso-generated code significantly outperforms past automatic signaling
mechanisms (avg. 1.56x speedup) and closely matches the performance of
hand-optimized explicit-signal code
Verifying Equivalence of Database-Driven Applications
This paper addresses the problem of verifying equivalence between a pair of
programs that operate over databases with different schemas. This problem is
particularly important in the context of web applications, which typically
undergo database refactoring either for performance or maintainability reasons.
While web applications should have the same externally observable behavior
before and after schema migration, there are no existing tools for proving
equivalence of such programs. This paper takes a first step towards solving
this problem by formalizing the equivalence and refinement checking problems
for database-driven applications. We also propose a proof methodology based on
the notion of bisimulation invariants over relational algebra with updates and
describe a technique for synthesizing such bisimulation invariants. We have
implemented the proposed technique in a tool called Mediator for verifying
equivalence between database-driven applications written in our intermediate
language and evaluate our tool on 21 benchmarks extracted from textbooks and
real-world web applications. Our results show that the proposed methodology can
successfully verify 20 of these benchmarks
Verifying Semantic Conflict-Freedom in Three-Way Program Merges
Even though many programmers rely on 3-way merge tools to integrate changes
from different branches, such tools can introduce subtle bugs in the
integration process. This paper aims to mitigate this problem by defining a
semantic notion of confict-freedom, which ensures that the merged program does
not introduce new unwanted behaviors. We also show how to verify this property
using a novel, compositional algorithm that combines lightweight dependence
analysis for shared program fragments and precise relational reasoning for the
modifications. We evaluate our tool called SafeMerge on 52 real-world merge
scenarios obtained from Github and compare the results against a textual merge
tool. The experimental results demonstrate the benefits of our approach over
syntactic confict-freedom and indicate that SafeMerge is both precise and
practical
- …