49 research outputs found
Compiling a domain specific language for dynamic programming
Steffen P. Compiling a domain specific language for dynamic programming. Bielefeld (Germany): Bielefeld University; 2006
Proactive Synthesis of Recursive Tree-to-String Functions from Examples
Synthesis from examples enables non-expert users to generate programs by specifying examples of their behavior. A domain-specific form of such synthesis has been recently deployed in a widely used spreadsheet software product. In this paper we contribute to foundations of such techniques and present a complete algorithm for synthesis of a class of recursive functions defined by structural recursion over a given algebraic data type definition. The functions we consider map an algebraic data type to a string; they are useful for, e.g., pretty printing and serialization of programs and data. We formalize our problem as learning deterministic sequential
top-down tree-to-string transducers with a single state (1STS).
The first problem we consider is learning a tree-to-string
transducer from any set of input/output examples provided by the user. We show that, given a set of input/output examples, checking whether there exists a 1STS consistent with these examples is NP-complete in general. In contrast, the problem can be solved in polynomial time under a (practically useful) closure condition that each
subtree of a tree in the input/output example set is also
part of the input/output examples.
Because coming up with relevant input/output examples may be
difficult for the user while creating hard constraint problems
for the synthesizer, we also study a more automated
active learning scenario in which the algorithm chooses the
inputs for which the user provides the outputs. Our
algorithm asks a worst-case linear number of queries as a
function of the size of the algebraic data type definition
to determine a unique transducer.
To construct our algorithms we present two new results on
formal languages.
First, we define a class of word equations, called
sequential word equations, for which we prove that
satisfiability can be solved in deterministic polynomial
time. This is in contrast to the general word equations for
which the best known complexity upper bound is in linear space.
Second, we close a long-standing open problem about the
asymptotic size of test sets for context-free languages. A
test set of a language of words L is a subset T of L
such that any two word homomorphisms equivalent on T are
also equivalent on L. We prove that it is possible to
build test sets of cubic size for context-free languages,
matching for the first time the lower bound found 20 years
ago
Finding The Lazy Programmer's Bugs
Traditionally developers and testers created huge numbers of explicit tests, enumerating interesting cases, perhaps
biased by what they believe to be the current boundary conditions of the function being tested. Or at
least, they were supposed to.
A major step forward was the development of property testing. Property testing requires the user to write a few
functional properties that are used to generate tests, and requires an external library or tool to create test data
for the tests. As such many thousands of tests can be created for a single property. For the purely functional
programming language Haskell there are several such libraries; for example QuickCheck [CH00], SmallCheck
and Lazy SmallCheck [RNL08].
Unfortunately, property testing still requires the user to write explicit tests. Fortunately, we note there are
already many implicit tests present in programs. Developers may throw assertion errors, or the compiler may
silently insert runtime exceptions for incomplete pattern matches.
We attempt to automate the testing process using these implicit tests. Our contributions are in four main
areas: (1) We have developed algorithms to automatically infer appropriate constructors and functions needed
to generate test data without requiring additional programmer work or annotations. (2) To combine the
constructors and functions into test expressions we take advantage of Haskell's lazy evaluation semantics by
applying the techniques of needed narrowing and lazy instantiation to guide generation. (3) We keep the type
of test data at its most general, in order to prevent committing too early to monomorphic types that cause
needless wasted tests. (4) We have developed novel ways of creating Haskell case expressions to inspect elements
inside returned data structures, in order to discover exceptions that may be hidden by laziness, and to make
our test data generation algorithm more expressive.
In order to validate our claims, we have implemented these techniques in Irulan, a fully automatic tool for
generating systematic black-box unit tests for Haskell library code. We have designed Irulan to generate high
coverage test suites and detect common programming errors in the process
Bidirectional data transformation by calculation
MAPi Doctoral Programme in Computer ScienceThe advent of bidirectional programming, in recent years, has led to the development of
a vast number of approaches from various computer science disciplines. These are often
based on domain-specific languages in which a program can be read both as a forward
and a backward transformation that satisfy some desirable consistency properties.
Despite the high demand and recognized potential of intrinsically bidirectional
languages, they have still not matured to the point of mainstream adoption. This
dissertation contemplates some usually disregarded features of bidirectional transformation
languages that are vital for deployment at a larger scale. The first concerns
efficiency. Most of these languages provide a rich set of primitive combinators that
can be composed to build more sophisticated transformations. Although convenient,
such compositional languages are plagued by inefficiency and their optimization is
mandatory for a serious application. The second relates to configurability. As update
translation is inherently ambiguous, users shall be allowed to control the choice of a
suitable strategy. The third regards genericity. Writing a bidirectional transformation
typically implies describing the concrete steps that convert values in a source schema to values a target schema, making it impractical to express very complex transformations,
and practical tools shall support concise and generic coding patterns.
We first define a point-free language of bidirectional transformations (called lenses),
characterized by a powerful set of algebraic laws. Then, we tailor it to consider
additional parameters that describe updates, and use them to refine the behavior of
intricate lenses between arbitrary data structures. On top, we propose the Multifocal
framework for the evolution of XML schemas. A Multifocal program describes a
generic schema-level transformation, and has a value-level semantics defined using the
point-free lens language. Its optimization employs the novel algebraic lens calculus.O advento da programação bidirecional, nos últimos anos, fez surgir inúmeras abordagens
em diversas disciplinas de ciências da computação, geralmente baseadas em
linguagens de domínio específico em que um programa representa uma transformação
para a frente ou para trás, satisfazendo certas propriedades de consistência desejáveis.
Apesar do elevado potencial de linguagens intrinsicamente bidirecionais, estas ainda
não amadureceram o suficiente para serem correntemente utilizadas. Esta dissertação
contempla algumas características de linguagens bidirecionais usualmente negligenciadas,
mas vitais para um desenvolvimento em mais larga escala. A primeira refere-se
à eficiência. A maioria destas linguagens fornece um conjunto rico de combinadores
primitivos que podem ser utilizados para construir transformações mais sofisticadas
que, embora convenientes, são cronicamente ineficientes, exigindo ser otimizadas para
uma aplicação séria. A segunda diz respeito à configurabilidade. Sendo a tradução de
modificações inerentemente ambígua, os utilizadores devem poder controlar a escolha
de uma estratégia adequada. A terceira prende-se com a genericidade. Escrever uma
transformação bidirecional implica tipicamente descrever os passos que convertem um
modelo noutro diferente, enquanto que ferramentas práticas devem suportar padrões
concisos e genéricos de forma a poderem expressar transformações muito complexas. Primeiro, definimos uma linguagem de transformações bidirecionais (intituladas de
lentes), livre de variáveis, caracterizada por um poderoso conjunto de leis algébricas. De
seguida, adaptamo-la para receber parâmetros que descrevem modificações, e usamo-los
para refinar lentes intrincadas entre estruturas de dados arbitrárias. Por cima, propomos
a plataforma Multifocal para a evolução de modelos XML. Um programa Multifocal
descreve uma transformação genérica de modelos, cuja semântica ao nível dos valores
e consequente otimização é definida em função da linguagem de lentes
DynaProg for Scala
Dynamic programming is an algorithmic technique to solve problems that follow the Bellman’s principle: optimal solutions depends on optimal sub-problem solutions. The core idea behind dynamic programming is to memoize intermediate results into matrices to avoid multiple computations. Solving a dynamic programming problem consists of two phases: filling one or more matrices with intermediate solutions for sub-problems and recomposing how the final result was constructed (backtracking). In textbooks, problems are usually described in terms of recurrence relations between matrices elements. Expressing dynamic programming problems in terms of recursive formulae involving matrix indices might be difficult, if often error prone, and the notation does not capture the essence of the underlying problem (for example aligning two sequences). Moreover, writing correct and efficient parallel implementation requires different competencies and often a significant amount of time. In this project, we present DynaProg, a language embedded in Scala (DSL) to address dynamic programming problems on heterogeneous platforms. DynaProg allows the programmer to write concise programs based on ADP [1], using a pair of parsing grammar and algebra; these program can then be executed either on CPU or on GPU. We evaluate the performance of our implementation against existing work and our own hand-optimized baseline implementations for both the CPU and GPU versions. Experimental results show that plain Scala has a large overhead and is recommended to be used with small sequences (≤1024) whereas the generated GPU version is comparable with existing implementations: matrix chain multiplication has the same performance as our hand-optimized version (142% of the execution time of [2]) for a sequence of 4096 matrices, Smith-Waterman is twice slower than [3] on a pair of sequences of 6144 elements, and RNA folding is on par with [4] (95% running time) for sequences of 4096 elements. [1] Robert Giegerich and Carsten Meyer. Algebraic Dynamic Programming. [2] Chao-Chin Wu, Jenn-Yang Ke, Heshan Lin and Wu Chun Feng. Optimizing dynamic programming on graphics processing units via adaptive thread-level parallelism. [3] Edans Flavius de O. Sandes, Alba Cristina M. A. de Melo. Smith-Waterman alignment of huge sequences with GPU in linear space. [4] Guillaume Rizk and Dominique Lavenier. GPU accelerated RNA folding algorithm
Proceedings of the ACM SIGPLAN Workshop on Approaches and Applications of Inductive Programming (AAIP 2009)
Inductive programming is concerned with the automated construction of declarative, often functional, recursive programs from incomplete specifications such as input/output examples. The inferred program must be correct with respect to the provided examples in a generalising sense: it should be neither equivalent to them, nor inconsistent. Inductive programming algorithms are guided explicitly or implicitly by a language bias (the class of programs that can be induced) and a search bias (determining which generalised program is constructed first). Induction strategies are either generate-and-test or example-driven. In generate-and-test approaches, hypotheses about candidate programs are generated independently from the given specifications. Program candidates are tested against the given specification and one or more of the best evaluated candidates are developed further. In analytical approaches, candidate programs are constructed in an example-driven way. While generate-and-test approaches can -- in principle -- construct any kind of program, analytical approaches have a more limited scope. On the other hand, efficiency of induction is much higher in analytical approaches. Inductive programming is still mainly a topic of basic research, exploring how the intellectual ability of humans to infer generalised recursive procedures from incomplete evidence can be captured in the form of synthesis methods. Intended applications are mainly in the domain of programming assistance -- either to relieve professional programmers from routine tasks or to enable non-programmers to some limited form of end-user programming. Furthermore, in the future, inductive programming techniques might be applied to further areas such as supporting the inference of lemmata in theorem proving or learning grammar rules. Inductive automated program construction has been originally addressed by researchers in artificial intelligence and machine learning. During the last years, some work on exploiting induction techniques has been started also in the functional programming community. Therefore, the third workshop on |Approaches and Applications of Inductive Programming| took place for the first time in conjunction with the ACM SIGPLAN International Conference on Functional Programming (ICFP 2009). The first and second workshop were associated with the International Conference on Machine Learning (ICML 2005) and the European Conference on Machine Learning (ECML 2007). AAIP´09 aimed to bring together researchers from the functional programming and the artificial intelligence communities, working in the field of inductive functional programming, and advance fruitful interactions between these communities with respect to programming techniques for inductive programming algorithms, the identification of challenge problems and potential applications. For everybody interested in inductive programming we recommend to visit the website: www.inductive-programming.org
Protecting Systems From Exploits Using Language-Theoretic Security
Any computer program processing input from the user or network must validate the input. Input-handling vulnerabilities occur in programs when the software component responsible for filtering malicious input---the parser---does not perform validation adequately. Consequently, parsers are among the most targeted components since they defend the rest of the program from malicious input. This thesis adopts the Language-Theoretic Security (LangSec) principle to understand what tools and research are needed to prevent exploits that target parsers. LangSec proposes specifying the syntactic structure of the input format as a formal grammar. We then build a recognizer for this formal grammar to validate any input before the rest of the program acts on it. To ensure that these recognizers represent the data format, programmers often rely on parser generators or parser combinators tools to build the parsers. This thesis propels several sub-fields in LangSec by proposing new techniques to find bugs in implementations, novel categorizations of vulnerabilities, and new parsing algorithms and tools to handle practical data formats. To this end, this thesis comprises five parts that tackle various tenets of LangSec. First, I categorize various input-handling vulnerabilities and exploits using two frameworks. First, I use the mismorphisms framework to reason about vulnerabilities. This framework helps us reason about the root causes leading to various vulnerabilities. Next, we built a categorization framework using various LangSec anti-patterns, such as parser differentials and insufficient input validation. Finally, we built a catalog of more than 30 popular vulnerabilities to demonstrate the categorization frameworks. Second, I built parsers for various Internet of Things and power grid network protocols and the iccMAX file format using parser combinator libraries. The parsers I built for power grid protocols were deployed and tested on power grid substation networks as an intrusion detection tool. The parser I built for the iccMAX file format led to several corrections and modifications to the iccMAX specifications and reference implementations. Third, I present SPARTA, a novel tool I built that generates Rust code that type checks Portable Data Format (PDF) files. The type checker I helped build strictly enforces the constraints in the PDF specification to find deviations. Our checker has contributed to at least four significant clarifications and corrections to the PDF 2.0 specification and various open-source PDF tools. In addition to our checker, we also built a practical tool, PDFFixer, to dynamically patch type errors in PDF files. Fourth, I present ParseSmith, a tool to build verified parsers for real-world data formats. Most parsing tools available for data formats are insufficient to handle practical formats or have not been verified for their correctness. I built a verified parsing tool in Dafny that builds on ideas from attribute grammars, data-dependent grammars, and parsing expression grammars to tackle various constructs commonly seen in network formats. I prove that our parsers run in linear time and always terminate for well-formed grammars. Finally, I provide the earliest systematic comparison of various data description languages (DDLs) and their parser generation tools. DDLs are used to describe and parse commonly used data formats, such as image formats. Next, I conducted an expert elicitation qualitative study to derive various metrics that I use to compare the DDLs. I also systematically compare these DDLs based on sample data descriptions available with the DDLs---checking for correctness and resilience
Optimal program variant generation for hybrid manycore systems
Field Programmable Gate Arrays promise to deliver superior energy efficiency in heterogeneous high performance computing, as compared to multicore CPUs and GPUs. The rate of adoption is however hampered by the relative difficulty of programming FPGAs. High-level synthesis tools such as Xilinx Vivado, Altera OpenCL or Intel's HLS address a large part of the programmability issue by synthesizing a Hardware Description Languages representation from a high-level specification of the application, given in programming languages such as OpenCL C, typically used to program CPUs and GPUs. Although HLS solutions make programming easier, they fail to also lighten the burden of optimization. Application developers must rely on expert knowledge to manually optimize their applications for each target device, meaning that traditional HLS solutions do not offer a solution to the issue of performance portability. This state of fact prompted the development of compiler frameworks such as TyTra that operate at an even higher level of abstraction that is amenable to the use of Design Space Exploration (DSE). With DSE the initial program specification can be seen as the starting location in a search-space of correct-by-construction program transformations. In TyTra the search-space is generated from the transitive-closure of term-level transformations derived from type-level transformations. Compiler frameworks such as TyTra theoretically solve the issue of performance portability by providing a way to automatically generate alternative correct program variants. They however suffer from the very practical issue that the generated space is often too large to fully explore. As a consequence, the globally optimal solution may be overlooked.
In this work we provide a novel solution to issue performance portability by deriving an efficient yet effective DSE strategy for the TyTra compiler framework. We make use of categorical data types to derive categorical semantics for the formal languages that describe the terms, types, cost-performance estimates and their transformations. From these we define a category of interpretations for TyTra applications, from which we derive a DSE strategy that finds the globally optimal transformation sequence in polynomial time. This is achieved by reducing the size of the generated search space. We formally state and prove a theorem for this claim and then show that the polynomial run-time for our DSE strategy has practically negligible coefficients leading to sub-second exploration times for realistic applications