Search CORE

71 research outputs found

Synthesis of Data Completion Scripts using Finite Tree Automata

Author: Dillig Isil
Singh Rishabh
Wang Xinyu
Publication venue
Publication date: 05/07/2017
Field of study

In application domains that store data in a tabular format, a common task is to fill the values of some cells using values stored in other cells. For instance, such data completion tasks arise in the context of missing value imputation in data science and derived data computation in spreadsheets and relational databases. Unfortunately, end-users and data scientists typically struggle with many data completion tasks that require non-trivial programming expertise. This paper presents a synthesis technique for automating data completion tasks using programming-by-example (PBE) and a very lightweight sketching approach. Given a formula sketch (e.g., AVG(

?_1

?_2

)) and a few input-output examples for each hole, our technique synthesizes a program to automate the desired data completion task. Towards this goal, we propose a domain-specific language (DSL) that combines spatial and relational reasoning over tabular data and a novel synthesis algorithm that can generate DSL programs that are consistent with the input-output examples. The key technical novelty of our approach is a new version space learning algorithm that is based on finite tree automata (FTA). The use of FTAs in the learning algorithm leads to a more compact representation that allows more sharing between programs that are consistent with the examples. We have implemented the proposed approach in a tool called DACE and evaluate it on 84 benchmarks taken from online help forums. We also illustrate the advantages of our approach by comparing our technique against two existing synthesizers, namely PROSE and SKETCH

arXiv.org e-Print Archive

Program Synthesis using Abstraction Refinement

Author: Dillig Isil
Singh Rishabh
Wang Xinyu
Publication venue
Publication date: 20/10/2017
Field of study

We present a new approach to example-guided program synthesis based on counterexample-guided abstraction refinement. Our method uses the abstract semantics of the underlying DSL to find a program

P

whose abstract behavior satisfies the examples. However, since program

P

may be spurious with respect to the concrete semantics, our approach iteratively refines the abstraction until we either find a program that satisfies the examples or prove that no such DSL program exists. Because many programs have the same input-output behavior in terms of their abstract semantics, this synthesis methodology significantly reduces the search space compared to existing techniques that use purely concrete semantics. While synthesis using abstraction refinement (SYNGAR) could be implemented in different settings, we propose a refinement-based synthesis algorithm that uses abstract finite tree automata (AFTA). Our technique uses a coarse initial program abstraction to construct an initial AFTA, which is iteratively refined by constructing a proof of incorrectness of any spurious program. In addition to ruling out the spurious program accepted by the previous AFTA, proofs of incorrectness are also useful for ruling out many other spurious programs. We implement these ideas in a framework called \tool. We have used the BLAZE framework to build synthesizers for string and matrix transformations, and we compare BLAZE with existing techniques. Our results for the string domain show that BLAZE compares favorably with FlashFill, a domain-specific synthesizer that is now deployed in Microsoft PowerShell. In the context of matrix manipulations, we compare BLAZE against Prose, a state-of-the-art general-purpose VSA-based synthesizer, and show that BLAZE results in a 90x speed-up over Prose

arXiv.org e-Print Archive

Automated Migration of Hierarchical Data to Relational Tables using Programming-by-Example

Author: Dillig Isil
Wang Xinyu
Yaghmazadeh Navid
Publication venue
Publication date: 10/11/2017
Field of study

While many applications export data in hierarchical formats like XML and JSON, it is often necessary to convert such hierarchical documents to a relational representation. This paper presents a novel programming-by-example approach, and its implementation in a tool called Mitra, for automatically migrating tree-structured documents to relational tables. We have evaluated the proposed technique using two sets of experiments. In the first experiment, we used Mitra to automate 98 data transformation tasks collected from StackOverflow. Our method can generate the desired program for 94% of these benchmarks with an average synthesis time of 3.8 seconds. In the second experiment, we used Mitra to generate programs that can convert real-world XML and JSON datasets to full-fledged relational databases. Our evaluation shows that Mitra can automate the desired transformation for all datasets

arXiv.org e-Print Archive

Relational Program Synthesis

Author: Dillig Isil
Wang Xinyu
Wang Yuepeng
Publication venue
Publication date: 10/09/2018
Field of study

This paper proposes relational program synthesis, a new problem that concerns synthesizing one or more programs that collectively satisfy a relational specification. As a dual of relational program verification, relational program synthesis is an important problem that has many practical applications, such as automated program inversion and automatic generation of comparators. However, this relational synthesis problem introduces new challenges over its non-relational counterpart due to the combinatorially larger search space. As a first step towards solving this problem, this paper presents a synthesis technique that combines the counterexample-guided inductive synthesis framework with a novel inductive synthesis algorithm that is based on relational version space learning. We have implemented the proposed technique in a framework called Relish, which can be instantiated to different application domains by providing a suitable domain-specific language and the relevant relational specification. We have used the Relish framework to build relational synthesizers to automatically generate string encoders/decoders as well as comparators, and we evaluate our tool on several benchmarks taken from prior work and online forums. Our experimental results show that the proposed technique can solve almost all of these benchmarks and that it significantly outperforms EUSolver, a generic synthesis framework that won the general track of the most recent SyGuS competition

arXiv.org e-Print Archive

LambdaNet: Probabilistic Type Inference using Graph Neural Networks

Author: Dillig Isil
Durrett Greg
Goyal Maruth
Wei Jiayi
Publication venue
Publication date: 29/04/2020
Field of study

As gradual typing becomes increasingly popular in languages like Python and TypeScript, there is a growing need to infer type annotations automatically. While type annotations help with tasks like code completion and static error catching, these annotations cannot be fully determined by compilers and are tedious to annotate by hand. This paper proposes a probabilistic type inference scheme for TypeScript based on a graph neural network. Our approach first uses lightweight source code analysis to generate a program abstraction called a type dependency graph, which links type variables with logical constraints as well as name and usage information. Given this program abstraction, we then use a graph neural network to propagate information between related type variables and eventually make type predictions. Our neural architecture can predict both standard types, like number or string, as well as user-defined types that have not been encountered during training. Our experimental results show that our approach outperforms prior work in this space by

14\%

(absolute) on library types, while having the ability to make type predictions that are out of scope for existing techniques.Comment: Accepted as a poster at ICLR 202

arXiv.org e-Print Archive

Synthesizing Database Programs for Schema Refactoring

Author: Dillig Isil
Dong James
Shah Rushi
Wang Yuepeng
Publication venue
Publication date: 10/04/2019
Field of study

Many programs that interact with a database need to undergo schema refactoring several times during their life cycle. Since this process typically requires making significant changes to the program's implementation, schema refactoring is often non-trivial and error-prone. Motivated by this problem, we propose a new technique for automatically synthesizing a new version of a database program given its original version and the source and target schemas. Our method does not require manual user guidance and ensures that the synthesized program is equivalent to the original one. Furthermore, our method is quite efficient and can synthesize new versions of database programs (containing up to 263 functions) that are extracted from real-world web applications with an average synthesis time of 69.4 seconds

arXiv.org e-Print Archive

Failure-Directed Program Trimming (Extended Version)

Author: Christakis Maria
Dillig Isil
Ferles Kostas
Wüstholz Valentin
Publication venue
Publication date: 14/06/2017
Field of study

This paper describes a new program simplification technique called program trimming that aims to improve the scalability and precision of safety checking tools. Given a program

{\mathcal P}

, program trimming generates a new program

{\mathcal P}'

such that

{\mathcal P}

and

{\mathcal P}'

are equi-safe (i.e.,

{\mathcal P}'

has a bug if and only if

{\mathcal P}

has a bug), but

{\mathcal P}'

has fewer execution paths than

{\mathcal P}

. Since many program analyzers are sensitive to the number of execution paths, program trimming has the potential to improve the effectiveness of safety checking tools. In addition to introducing the concept of program trimming, this paper also presents a lightweight static analysis that can be used as a pre-processing step to remove program paths while retaining equi-safety. We have implemented the proposed technique in a tool called Trimmer and evaluate it in the context of two program analysis techniques, namely abstract interpretation and dynamic symbolic execution. Our experiments show that program trimming significantly improves the effectiveness of both techniques

arXiv.org e-Print Archive

Symbolic Reasoning for Automatic Signal Placement (Extended Version)

Author: Dillig Isil
Ferles Kostas
Smaragdakis Yannis
Van Geffen Jacob
Publication venue
Publication date: 06/04/2018
Field of study

Explicit signaling between threads is a perennial cause of bugs in concurrent programs. While there are several run-time techniques to automatically notify threads upon the availability of some shared resource, such techniques are not widely-adopted due to their run-time overhead. This paper proposes a new solution based on static analysis for automatically generating a performant explicit-signal program from its corresponding implicit-signal implementation. The key idea is to generate verification conditions that allow us to minimize the number of required signals and unnecessary context switches, while guaranteeing semantic equivalence between the source and target programs. We have implemented our method in a tool called Expresso and evaluate it on challenging benchmarks from prior papers and open-source software. Expresso-generated code significantly outperforms past automatic signaling mechanisms (avg. 1.56x speedup) and closely matches the performance of hand-optimized explicit-signal code

arXiv.org e-Print Archive

Verifying Equivalence of Database-Driven Applications

Author: Cook William R.
Dillig Isil
Lahiri Shuvendu K.
Wang Yuepeng
Publication venue
Publication date: 20/10/2017
Field of study

This paper addresses the problem of verifying equivalence between a pair of programs that operate over databases with different schemas. This problem is particularly important in the context of web applications, which typically undergo database refactoring either for performance or maintainability reasons. While web applications should have the same externally observable behavior before and after schema migration, there are no existing tools for proving equivalence of such programs. This paper takes a first step towards solving this problem by formalizing the equivalence and refinement checking problems for database-driven applications. We also propose a proof methodology based on the notion of bisimulation invariants over relational algebra with updates and describe a technique for synthesizing such bisimulation invariants. We have implemented the proposed technique in a tool called Mediator for verifying equivalence between database-driven applications written in our intermediate language and evaluate our tool on 21 benchmarks extracted from textbooks and real-world web applications. Our results show that the proposed methodology can successfully verify 20 of these benchmarks

arXiv.org e-Print Archive

Verifying Semantic Conflict-Freedom in Three-Way Program Merges

Author: Dillig Isil
Lahiri Shuvendu
Sousa Marcelo
Publication venue
Publication date: 19/02/2018
Field of study

Even though many programmers rely on 3-way merge tools to integrate changes from different branches, such tools can introduce subtle bugs in the integration process. This paper aims to mitigate this problem by defining a semantic notion of confict-freedom, which ensures that the merged program does not introduce new unwanted behaviors. We also show how to verify this property using a novel, compositional algorithm that combines lightweight dependence analysis for shared program fragments and precise relational reasoning for the modifications. We evaluate our tool called SafeMerge on 52 real-world merge scenarios obtained from Github and compare the results against a textual merge tool. The experimental results demonstrate the benefits of our approach over syntactic confict-freedom and indicate that SafeMerge is both precise and practical

arXiv.org e-Print Archive