9,463 research outputs found
Theory and Techniques for Synthesizing a Family of Graph Algorithms
Although Breadth-First Search (BFS) has several advantages over Depth-First
Search (DFS) its prohibitive space requirements have meant that algorithm
designers often pass it over in favor of DFS. To address this shortcoming, we
introduce a theory of Efficient BFS (EBFS) along with a simple recursive
program schema for carrying out the search. The theory is based on dominance
relations, a long standing technique from the field of search algorithms. We
show how the theory can be used to systematically derive solutions to two graph
algorithms, namely the Single Source Shortest Path problem and the Minimum
Spanning Tree problem. The solutions are found by making small systematic
changes to the derivation, revealing the connections between the two problems
which are often obscured in textbook presentations of them.Comment: In Proceedings SYNT 2012, arXiv:1207.055
Sketched Answer Set Programming
Answer Set Programming (ASP) is a powerful modeling formalism for
combinatorial problems. However, writing ASP models is not trivial. We propose
a novel method, called Sketched Answer Set Programming (SkASP), aiming at
supporting the user in resolving this issue. The user writes an ASP program
while marking uncertain parts open with question marks. In addition, the user
provides a number of positive and negative examples of the desired program
behaviour. The sketched model is rewritten into another ASP program, which is
solved by traditional methods. As a result, the user obtains a functional and
reusable ASP program modelling her problem. We evaluate our approach on 21 well
known puzzles and combinatorial problems inspired by Karp's 21 NP-complete
problems and demonstrate a use-case for a database application based on ASP.Comment: 15 pages, 11 figures; to appear in ICTAI 201
AutoBayes: A System for Generating Data Analysis Programs from Statistical Models
Data analysis is an important scientific task which is required whenever information needs to be extracted from raw data. Statistical approaches to data analysis, which use methods from probability theory and numerical analysis, are well-founded but difficult to implement: the development of a statistical data analysis program for any given application is time-consuming and requires substantial knowledge and experience in several areas. In this paper, we describe AutoBayes, a program synthesis system for the generation of data analysis programs from statistical models. A statistical model specifies the properties for each problem variable (i.e., observation or parameter) and its dependencies in the form of a probability distribution. It is a fully declarative problem description, similar in spirit to a set of differential equations. From such a model, AutoBayes generates optimized and fully commented C/C++ code which can be linked dynamically into the Matlab and Octave environments. Code is produced by a schema-guided deductive synthesis process. A schema consists of a code template and applicability constraints which are checked against the model during synthesis using theorem proving technology. AutoBayes augments schema-guided synthesis by symbolic-algebraic computation and can thus derive closed-form solutions for many problems. It is well-suited for tasks like estimating best-fitting model parameters for the given data. Here, we describe AutoBayes's system architecture, in particular the schema-guided synthesis kernel. Its capabilities are illustrated by a number of advanced textbook examples and benchmarks
Democratizing Self-Service Data Preparation through Example Guided Program Synthesis,
The majority of real-world data we can access today have one thing in common: they are not immediately usable in their original state. Trapped in a swamp of data usability issues like non-standard data formats and heterogeneous data sources, most data analysts and machine learning practitioners have to burden themselves with "data janitor" work, writing ad-hoc Python, PERL or SQL scripts, which is tedious and inefficient. It is estimated that data scientists or analysts typically spend 80% of their time in preparing data, a significant amount of human effort that can be redirected to better goals. In this dissertation, we accomplish this task by harnessing knowledge such as examples and other useful hints from the end user. We develop program synthesis techniques guided by heuristics and machine learning, which effectively make data preparation less painful and more efficient to perform by data users, particularly those with little to no programming experience.
Data transformation, also called data wrangling or data munging, is an important task in data preparation, seeking to convert data from one format to a different (often more structured) format. Our system Foofah shows that allowing end users to describe their desired transformation, through providing small input-output transformation examples, can significantly reduce the overall user effort. The underlying program synthesizer can often succeed in finding meaningful data transformation programs within a reasonably short amount of time. Our second system, CLX, demonstrates that sometimes the user does not even need to provide complete input-output examples, but only label ones that are desirable if they exist in the original dataset. The system is still capable of suggesting reasonable and explainable transformation operations to fix the non-standard data format issue in a dataset full of heterogeneous data with varied formats.
PRISM, our third system, targets a data preparation task of data integration, i.e., combining multiple relations to formulate a desired schema. PRISM allows the user to describe the target schema using not only high-resolution (precise) constraints of complete example data records in the target schema, but also (imprecise) constraints of varied resolutions, such as incomplete data record examples with missing values, value ranges, or multiple possible values in each element (cell), so as to require less familiarity of the database contents from the end user.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163059/1/markjin_1.pd
Type-driven Synthesis of Evolving Data Mode
Modern commercial software is often framed under the umbrella of data-centric applications.
Data-centric applications define data as the main and permanent asset. These
applications use a single data model for application functionality, data management, and
analytical activities, which is built before the applications.
Moreover, since applications are temporary, in contrast to data, there is the need to
continuously evolve and change the data schema to accommodate new functionality. In
this sense, the continuously evolving (rich) feature set that is expected of state-of-the-art
applications is intrinsically bound by not only the amount of available data but also by
its structure, its internal dependencies, and by the ability to transparently and uniformly
grow and evolve data representations and their properties on the fly.
The GOLEM project aims to produce new methods of program automation integrated
in the development of data-centric applications in low-code frameworks. In this context,
one of the key targets for automation is the data layer itself, encompassing the data layout
and its integrity constraints, as well as validation and access control rules.
The aim of this dissertation, which is integrated in GOLEM, is to develop a synthesis
framework that, based on high-level specifications, correctly defines and evolves a
rich data layer component by means of high-level operations. The construction of the
framework was approached by defining a specification language to express richly-typed
specifications, a target language which is the goal of synthesis and a type-directed synthesis
procedure based on proof-search concepts.
The range of real database operations the framework is able to synthesize is demonstrated
through a case study. In a component-based synthesis style, with an extensible
library of base operations on database tables (specified using the target language) in context,
the case study shows that the synthesis framework is capable of expressing and
solving a wide variety of data schema creation and evolution problems.Os sistemas modernos de software comercial são frequentemente caracterizados como
aplicações centradas em dados. Estas aplicações definem os dados como o seu principal
e persistente ativo, e utilizam um único modelo de dados para as suas funcionalidades,
gestão de dados, e atividades analíticas.
Além disso, uma vez que as aplicações são efémeras, contrariamente aos dados, existe
a necessidade de continuamente evoluir o esquema de dados para introduzir novas funcionalidades.
Neste sentido, o conjunto rico de características e em constante evolução
que é esperado das aplicações modernas encontra-se restricto, não só pela quantidade de
dados disponíveis, mas também pela sua estrutura, dependências internas, e a capacidade
de crescer e evoluir a representação dos dados de uma forma uniforme e rápida.
O projeto GOLEM tem como objetivo a produção de novos métodos de automação de
programas integrado no desenvolvimento de aplicações centradas nos dados em sistemas
low-code. Neste contexto, um dos objetivos principais de automação é a camada de dados,
compreendendo a estrutura dos dados e as respectivas condições de integridade, como
também as regras de validação e controlo de acessos.
O objetivo desta dissertação, integrada no projeto GOLEM, é o desenvolvimento de
um sistema de síntese que, baseado em especificações de alto nível, define e evolui corretamente
uma camada de dados rica com recurso a operações de alto nível. A construção
deste sistema baseia-se na definição de uma linguagem de especificação que permite definir
especificações com tipos ricos, uma linguagem de expressões que é considerada o
objetivo da síntese e um procedimento de síntese orientada pelos tipos.
O espectro de operações reais de bases de dados que o sistema consegue sintetizar é
demonstrado através de um caso de estudo. Com uma biblioteca extensível de operações
sobre tabelas no contexto, o caso de estudo demonstra que o sistema de síntese é capaz
de expressar e resolver uma grande variedade de problemas de criação e evolução de
esquemas de dados
Learning programs by learning from failures
We describe an inductive logic programming (ILP) approach called learning
from failures. In this approach, an ILP system (the learner) decomposes the
learning problem into three separate stages: generate, test, and constrain. In
the generate stage, the learner generates a hypothesis (a logic program) that
satisfies a set of hypothesis constraints (constraints on the syntactic form of
hypotheses). In the test stage, the learner tests the hypothesis against
training examples. A hypothesis fails when it does not entail all the positive
examples or entails a negative example. If a hypothesis fails, then, in the
constrain stage, the learner learns constraints from the failed hypothesis to
prune the hypothesis space, i.e. to constrain subsequent hypothesis generation.
For instance, if a hypothesis is too general (entails a negative example), the
constraints prune generalisations of the hypothesis. If a hypothesis is too
specific (does not entail all the positive examples), the constraints prune
specialisations of the hypothesis. This loop repeats until either (i) the
learner finds a hypothesis that entails all the positive and none of the
negative examples, or (ii) there are no more hypotheses to test. We introduce
Popper, an ILP system that implements this approach by combining answer set
programming and Prolog. Popper supports infinite problem domains, reasoning
about lists and numbers, learning textually minimal programs, and learning
recursive programs. Our experimental results on three domains (toy game
problems, robot strategies, and list transformations) show that (i) constraints
drastically improve learning performance, and (ii) Popper can outperform
existing ILP systems, both in terms of predictive accuracies and learning
times.Comment: Accepted for the machine learning journa
- …