Program synthesis using statistical models and logical reasoning

Abstract

Complex APIs in new frameworks (Spark, R, TensorFlow, etc) have imposed steep learning curves on everyone, especially for people with limited programming backgrounds. For instance, due to the messy nature of data in different application domains, data scientists spend close to 80% of their time in data wrangling tasks, which are considered to be the "janitor work" of data science. Similarly, software engineers spend hours or even days learning how to use APIs through official documentation or examples from online forums. Program synthesis has the potential to automate complex tasks that involve API usage by providing powerful search algorithms to look for executable programs that satisfy a given specification (input-output examples, partial programs, formal specs, etc). However, the biggest barrier to a practical synthesizer is the size of search space, which increases strikingly fast with the complexity of the programs and the size of the targeted APIs. To address the above issue, this dissertation focuses on developing algorithms that push the frontiers of program synthesis. First, we propose a type-directed graph reachability algorithm in SyPet, a synthesizer for assembling programs from complex APIs. Second, we show how to combine enumerative search with lightweight constraint-based deduction in Morpheus, a synthesizer for automating real-world data wrangling tasks from input-output examples. Finally, we generalize the previous approaches to develop a novel conflict-driven synthesis algorithm that can learn from past mistakes.Computer Science

    Similar works