9,463 research outputs found

    Theory and Techniques for Synthesizing a Family of Graph Algorithms

    Full text link
    Although Breadth-First Search (BFS) has several advantages over Depth-First Search (DFS) its prohibitive space requirements have meant that algorithm designers often pass it over in favor of DFS. To address this shortcoming, we introduce a theory of Efficient BFS (EBFS) along with a simple recursive program schema for carrying out the search. The theory is based on dominance relations, a long standing technique from the field of search algorithms. We show how the theory can be used to systematically derive solutions to two graph algorithms, namely the Single Source Shortest Path problem and the Minimum Spanning Tree problem. The solutions are found by making small systematic changes to the derivation, revealing the connections between the two problems which are often obscured in textbook presentations of them.Comment: In Proceedings SYNT 2012, arXiv:1207.055

    Sketched Answer Set Programming

    Full text link
    Answer Set Programming (ASP) is a powerful modeling formalism for combinatorial problems. However, writing ASP models is not trivial. We propose a novel method, called Sketched Answer Set Programming (SkASP), aiming at supporting the user in resolving this issue. The user writes an ASP program while marking uncertain parts open with question marks. In addition, the user provides a number of positive and negative examples of the desired program behaviour. The sketched model is rewritten into another ASP program, which is solved by traditional methods. As a result, the user obtains a functional and reusable ASP program modelling her problem. We evaluate our approach on 21 well known puzzles and combinatorial problems inspired by Karp's 21 NP-complete problems and demonstrate a use-case for a database application based on ASP.Comment: 15 pages, 11 figures; to appear in ICTAI 201

    AutoBayes: A System for Generating Data Analysis Programs from Statistical Models

    No full text
    Data analysis is an important scientific task which is required whenever information needs to be extracted from raw data. Statistical approaches to data analysis, which use methods from probability theory and numerical analysis, are well-founded but difficult to implement: the development of a statistical data analysis program for any given application is time-consuming and requires substantial knowledge and experience in several areas. In this paper, we describe AutoBayes, a program synthesis system for the generation of data analysis programs from statistical models. A statistical model specifies the properties for each problem variable (i.e., observation or parameter) and its dependencies in the form of a probability distribution. It is a fully declarative problem description, similar in spirit to a set of differential equations. From such a model, AutoBayes generates optimized and fully commented C/C++ code which can be linked dynamically into the Matlab and Octave environments. Code is produced by a schema-guided deductive synthesis process. A schema consists of a code template and applicability constraints which are checked against the model during synthesis using theorem proving technology. AutoBayes augments schema-guided synthesis by symbolic-algebraic computation and can thus derive closed-form solutions for many problems. It is well-suited for tasks like estimating best-fitting model parameters for the given data. Here, we describe AutoBayes's system architecture, in particular the schema-guided synthesis kernel. Its capabilities are illustrated by a number of advanced textbook examples and benchmarks

    Democratizing Self-Service Data Preparation through Example Guided Program Synthesis,

    Full text link
    The majority of real-world data we can access today have one thing in common: they are not immediately usable in their original state. Trapped in a swamp of data usability issues like non-standard data formats and heterogeneous data sources, most data analysts and machine learning practitioners have to burden themselves with "data janitor" work, writing ad-hoc Python, PERL or SQL scripts, which is tedious and inefficient. It is estimated that data scientists or analysts typically spend 80% of their time in preparing data, a significant amount of human effort that can be redirected to better goals. In this dissertation, we accomplish this task by harnessing knowledge such as examples and other useful hints from the end user. We develop program synthesis techniques guided by heuristics and machine learning, which effectively make data preparation less painful and more efficient to perform by data users, particularly those with little to no programming experience. Data transformation, also called data wrangling or data munging, is an important task in data preparation, seeking to convert data from one format to a different (often more structured) format. Our system Foofah shows that allowing end users to describe their desired transformation, through providing small input-output transformation examples, can significantly reduce the overall user effort. The underlying program synthesizer can often succeed in finding meaningful data transformation programs within a reasonably short amount of time. Our second system, CLX, demonstrates that sometimes the user does not even need to provide complete input-output examples, but only label ones that are desirable if they exist in the original dataset. The system is still capable of suggesting reasonable and explainable transformation operations to fix the non-standard data format issue in a dataset full of heterogeneous data with varied formats. PRISM, our third system, targets a data preparation task of data integration, i.e., combining multiple relations to formulate a desired schema. PRISM allows the user to describe the target schema using not only high-resolution (precise) constraints of complete example data records in the target schema, but also (imprecise) constraints of varied resolutions, such as incomplete data record examples with missing values, value ranges, or multiple possible values in each element (cell), so as to require less familiarity of the database contents from the end user.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163059/1/markjin_1.pd

    Type-driven Synthesis of Evolving Data Mode

    Get PDF
    Modern commercial software is often framed under the umbrella of data-centric applications. Data-centric applications define data as the main and permanent asset. These applications use a single data model for application functionality, data management, and analytical activities, which is built before the applications. Moreover, since applications are temporary, in contrast to data, there is the need to continuously evolve and change the data schema to accommodate new functionality. In this sense, the continuously evolving (rich) feature set that is expected of state-of-the-art applications is intrinsically bound by not only the amount of available data but also by its structure, its internal dependencies, and by the ability to transparently and uniformly grow and evolve data representations and their properties on the fly. The GOLEM project aims to produce new methods of program automation integrated in the development of data-centric applications in low-code frameworks. In this context, one of the key targets for automation is the data layer itself, encompassing the data layout and its integrity constraints, as well as validation and access control rules. The aim of this dissertation, which is integrated in GOLEM, is to develop a synthesis framework that, based on high-level specifications, correctly defines and evolves a rich data layer component by means of high-level operations. The construction of the framework was approached by defining a specification language to express richly-typed specifications, a target language which is the goal of synthesis and a type-directed synthesis procedure based on proof-search concepts. The range of real database operations the framework is able to synthesize is demonstrated through a case study. In a component-based synthesis style, with an extensible library of base operations on database tables (specified using the target language) in context, the case study shows that the synthesis framework is capable of expressing and solving a wide variety of data schema creation and evolution problems.Os sistemas modernos de software comercial são frequentemente caracterizados como aplicações centradas em dados. Estas aplicações definem os dados como o seu principal e persistente ativo, e utilizam um único modelo de dados para as suas funcionalidades, gestão de dados, e atividades analíticas. Além disso, uma vez que as aplicações são efémeras, contrariamente aos dados, existe a necessidade de continuamente evoluir o esquema de dados para introduzir novas funcionalidades. Neste sentido, o conjunto rico de características e em constante evolução que é esperado das aplicações modernas encontra-se restricto, não só pela quantidade de dados disponíveis, mas também pela sua estrutura, dependências internas, e a capacidade de crescer e evoluir a representação dos dados de uma forma uniforme e rápida. O projeto GOLEM tem como objetivo a produção de novos métodos de automação de programas integrado no desenvolvimento de aplicações centradas nos dados em sistemas low-code. Neste contexto, um dos objetivos principais de automação é a camada de dados, compreendendo a estrutura dos dados e as respectivas condições de integridade, como também as regras de validação e controlo de acessos. O objetivo desta dissertação, integrada no projeto GOLEM, é o desenvolvimento de um sistema de síntese que, baseado em especificações de alto nível, define e evolui corretamente uma camada de dados rica com recurso a operações de alto nível. A construção deste sistema baseia-se na definição de uma linguagem de especificação que permite definir especificações com tipos ricos, uma linguagem de expressões que é considerada o objetivo da síntese e um procedimento de síntese orientada pelos tipos. O espectro de operações reais de bases de dados que o sistema consegue sintetizar é demonstrado através de um caso de estudo. Com uma biblioteca extensível de operações sobre tabelas no contexto, o caso de estudo demonstra que o sistema de síntese é capaz de expressar e resolver uma grande variedade de problemas de criação e evolução de esquemas de dados

    Learning programs by learning from failures

    Full text link
    We describe an inductive logic programming (ILP) approach called learning from failures. In this approach, an ILP system (the learner) decomposes the learning problem into three separate stages: generate, test, and constrain. In the generate stage, the learner generates a hypothesis (a logic program) that satisfies a set of hypothesis constraints (constraints on the syntactic form of hypotheses). In the test stage, the learner tests the hypothesis against training examples. A hypothesis fails when it does not entail all the positive examples or entails a negative example. If a hypothesis fails, then, in the constrain stage, the learner learns constraints from the failed hypothesis to prune the hypothesis space, i.e. to constrain subsequent hypothesis generation. For instance, if a hypothesis is too general (entails a negative example), the constraints prune generalisations of the hypothesis. If a hypothesis is too specific (does not entail all the positive examples), the constraints prune specialisations of the hypothesis. This loop repeats until either (i) the learner finds a hypothesis that entails all the positive and none of the negative examples, or (ii) there are no more hypotheses to test. We introduce Popper, an ILP system that implements this approach by combining answer set programming and Prolog. Popper supports infinite problem domains, reasoning about lists and numbers, learning textually minimal programs, and learning recursive programs. Our experimental results on three domains (toy game problems, robot strategies, and list transformations) show that (i) constraints drastically improve learning performance, and (ii) Popper can outperform existing ILP systems, both in terms of predictive accuracies and learning times.Comment: Accepted for the machine learning journa
    corecore