49 research outputs found

    Compiling a domain specific language for dynamic programming

    Get PDF
    Steffen P. Compiling a domain specific language for dynamic programming. Bielefeld (Germany): Bielefeld University; 2006

    Proactive Synthesis of Recursive Tree-to-String Functions from Examples

    Get PDF
    Synthesis from examples enables non-expert users to generate programs by specifying examples of their behavior. A domain-specific form of such synthesis has been recently deployed in a widely used spreadsheet software product. In this paper we contribute to foundations of such techniques and present a complete algorithm for synthesis of a class of recursive functions defined by structural recursion over a given algebraic data type definition. The functions we consider map an algebraic data type to a string; they are useful for, e.g., pretty printing and serialization of programs and data. We formalize our problem as learning deterministic sequential top-down tree-to-string transducers with a single state (1STS). The first problem we consider is learning a tree-to-string transducer from any set of input/output examples provided by the user. We show that, given a set of input/output examples, checking whether there exists a 1STS consistent with these examples is NP-complete in general. In contrast, the problem can be solved in polynomial time under a (practically useful) closure condition that each subtree of a tree in the input/output example set is also part of the input/output examples. Because coming up with relevant input/output examples may be difficult for the user while creating hard constraint problems for the synthesizer, we also study a more automated active learning scenario in which the algorithm chooses the inputs for which the user provides the outputs. Our algorithm asks a worst-case linear number of queries as a function of the size of the algebraic data type definition to determine a unique transducer. To construct our algorithms we present two new results on formal languages. First, we define a class of word equations, called sequential word equations, for which we prove that satisfiability can be solved in deterministic polynomial time. This is in contrast to the general word equations for which the best known complexity upper bound is in linear space. Second, we close a long-standing open problem about the asymptotic size of test sets for context-free languages. A test set of a language of words L is a subset T of L such that any two word homomorphisms equivalent on T are also equivalent on L. We prove that it is possible to build test sets of cubic size for context-free languages, matching for the first time the lower bound found 20 years ago

    Finding The Lazy Programmer's Bugs

    Get PDF
    Traditionally developers and testers created huge numbers of explicit tests, enumerating interesting cases, perhaps biased by what they believe to be the current boundary conditions of the function being tested. Or at least, they were supposed to. A major step forward was the development of property testing. Property testing requires the user to write a few functional properties that are used to generate tests, and requires an external library or tool to create test data for the tests. As such many thousands of tests can be created for a single property. For the purely functional programming language Haskell there are several such libraries; for example QuickCheck [CH00], SmallCheck and Lazy SmallCheck [RNL08]. Unfortunately, property testing still requires the user to write explicit tests. Fortunately, we note there are already many implicit tests present in programs. Developers may throw assertion errors, or the compiler may silently insert runtime exceptions for incomplete pattern matches. We attempt to automate the testing process using these implicit tests. Our contributions are in four main areas: (1) We have developed algorithms to automatically infer appropriate constructors and functions needed to generate test data without requiring additional programmer work or annotations. (2) To combine the constructors and functions into test expressions we take advantage of Haskell's lazy evaluation semantics by applying the techniques of needed narrowing and lazy instantiation to guide generation. (3) We keep the type of test data at its most general, in order to prevent committing too early to monomorphic types that cause needless wasted tests. (4) We have developed novel ways of creating Haskell case expressions to inspect elements inside returned data structures, in order to discover exceptions that may be hidden by laziness, and to make our test data generation algorithm more expressive. In order to validate our claims, we have implemented these techniques in Irulan, a fully automatic tool for generating systematic black-box unit tests for Haskell library code. We have designed Irulan to generate high coverage test suites and detect common programming errors in the process

    Bidirectional data transformation by calculation

    Get PDF
    MAPi Doctoral Programme in Computer ScienceThe advent of bidirectional programming, in recent years, has led to the development of a vast number of approaches from various computer science disciplines. These are often based on domain-specific languages in which a program can be read both as a forward and a backward transformation that satisfy some desirable consistency properties. Despite the high demand and recognized potential of intrinsically bidirectional languages, they have still not matured to the point of mainstream adoption. This dissertation contemplates some usually disregarded features of bidirectional transformation languages that are vital for deployment at a larger scale. The first concerns efficiency. Most of these languages provide a rich set of primitive combinators that can be composed to build more sophisticated transformations. Although convenient, such compositional languages are plagued by inefficiency and their optimization is mandatory for a serious application. The second relates to configurability. As update translation is inherently ambiguous, users shall be allowed to control the choice of a suitable strategy. The third regards genericity. Writing a bidirectional transformation typically implies describing the concrete steps that convert values in a source schema to values a target schema, making it impractical to express very complex transformations, and practical tools shall support concise and generic coding patterns. We first define a point-free language of bidirectional transformations (called lenses), characterized by a powerful set of algebraic laws. Then, we tailor it to consider additional parameters that describe updates, and use them to refine the behavior of intricate lenses between arbitrary data structures. On top, we propose the Multifocal framework for the evolution of XML schemas. A Multifocal program describes a generic schema-level transformation, and has a value-level semantics defined using the point-free lens language. Its optimization employs the novel algebraic lens calculus.O advento da programação bidirecional, nos últimos anos, fez surgir inúmeras abordagens em diversas disciplinas de ciências da computação, geralmente baseadas em linguagens de domínio específico em que um programa representa uma transformação para a frente ou para trás, satisfazendo certas propriedades de consistência desejáveis. Apesar do elevado potencial de linguagens intrinsicamente bidirecionais, estas ainda não amadureceram o suficiente para serem correntemente utilizadas. Esta dissertação contempla algumas características de linguagens bidirecionais usualmente negligenciadas, mas vitais para um desenvolvimento em mais larga escala. A primeira refere-se à eficiência. A maioria destas linguagens fornece um conjunto rico de combinadores primitivos que podem ser utilizados para construir transformações mais sofisticadas que, embora convenientes, são cronicamente ineficientes, exigindo ser otimizadas para uma aplicação séria. A segunda diz respeito à configurabilidade. Sendo a tradução de modificações inerentemente ambígua, os utilizadores devem poder controlar a escolha de uma estratégia adequada. A terceira prende-se com a genericidade. Escrever uma transformação bidirecional implica tipicamente descrever os passos que convertem um modelo noutro diferente, enquanto que ferramentas práticas devem suportar padrões concisos e genéricos de forma a poderem expressar transformações muito complexas. Primeiro, definimos uma linguagem de transformações bidirecionais (intituladas de lentes), livre de variáveis, caracterizada por um poderoso conjunto de leis algébricas. De seguida, adaptamo-la para receber parâmetros que descrevem modificações, e usamo-los para refinar lentes intrincadas entre estruturas de dados arbitrárias. Por cima, propomos a plataforma Multifocal para a evolução de modelos XML. Um programa Multifocal descreve uma transformação genérica de modelos, cuja semântica ao nível dos valores e consequente otimização é definida em função da linguagem de lentes

    DynaProg for Scala

    Get PDF
    Dynamic programming is an algorithmic technique to solve problems that follow the Bellman’s principle: optimal solutions depends on optimal sub-problem solutions. The core idea behind dynamic programming is to memoize intermediate results into matrices to avoid multiple computations. Solving a dynamic programming problem consists of two phases: filling one or more matrices with intermediate solutions for sub-problems and recomposing how the final result was constructed (backtracking). In textbooks, problems are usually described in terms of recurrence relations between matrices elements. Expressing dynamic programming problems in terms of recursive formulae involving matrix indices might be difficult, if often error prone, and the notation does not capture the essence of the underlying problem (for example aligning two sequences). Moreover, writing correct and efficient parallel implementation requires different competencies and often a significant amount of time. In this project, we present DynaProg, a language embedded in Scala (DSL) to address dynamic programming problems on heterogeneous platforms. DynaProg allows the programmer to write concise programs based on ADP [1], using a pair of parsing grammar and algebra; these program can then be executed either on CPU or on GPU. We evaluate the performance of our implementation against existing work and our own hand-optimized baseline implementations for both the CPU and GPU versions. Experimental results show that plain Scala has a large overhead and is recommended to be used with small sequences (≤1024) whereas the generated GPU version is comparable with existing implementations: matrix chain multiplication has the same performance as our hand-optimized version (142% of the execution time of [2]) for a sequence of 4096 matrices, Smith-Waterman is twice slower than [3] on a pair of sequences of 6144 elements, and RNA folding is on par with [4] (95% running time) for sequences of 4096 elements. [1] Robert Giegerich and Carsten Meyer. Algebraic Dynamic Programming. [2] Chao-Chin Wu, Jenn-Yang Ke, Heshan Lin and Wu Chun Feng. Optimizing dynamic programming on graphics processing units via adaptive thread-level parallelism. [3] Edans Flavius de O. Sandes, Alba Cristina M. A. de Melo. Smith-Waterman alignment of huge sequences with GPU in linear space. [4] Guillaume Rizk and Dominique Lavenier. GPU accelerated RNA folding algorithm

    Proceedings of the ACM SIGPLAN Workshop on Approaches and Applications of Inductive Programming (AAIP 2009)

    Get PDF
    Inductive programming is concerned with the automated construction of declarative, often functional, recursive programs from incomplete specifications such as input/output examples. The inferred program must be correct with respect to the provided examples in a generalising sense: it should be neither equivalent to them, nor inconsistent. Inductive programming algorithms are guided explicitly or implicitly by a language bias (the class of programs that can be induced) and a search bias (determining which generalised program is constructed first). Induction strategies are either generate-and-test or example-driven. In generate-and-test approaches, hypotheses about candidate programs are generated independently from the given specifications. Program candidates are tested against the given specification and one or more of the best evaluated candidates are developed further. In analytical approaches, candidate programs are constructed in an example-driven way. While generate-and-test approaches can -- in principle -- construct any kind of program, analytical approaches have a more limited scope. On the other hand, efficiency of induction is much higher in analytical approaches. Inductive programming is still mainly a topic of basic research, exploring how the intellectual ability of humans to infer generalised recursive procedures from incomplete evidence can be captured in the form of synthesis methods. Intended applications are mainly in the domain of programming assistance -- either to relieve professional programmers from routine tasks or to enable non-programmers to some limited form of end-user programming. Furthermore, in the future, inductive programming techniques might be applied to further areas such as supporting the inference of lemmata in theorem proving or learning grammar rules. Inductive automated program construction has been originally addressed by researchers in artificial intelligence and machine learning. During the last years, some work on exploiting induction techniques has been started also in the functional programming community. Therefore, the third workshop on |Approaches and Applications of Inductive Programming| took place for the first time in conjunction with the ACM SIGPLAN International Conference on Functional Programming (ICFP 2009). The first and second workshop were associated with the International Conference on Machine Learning (ICML 2005) and the European Conference on Machine Learning (ECML 2007). AAIP´09 aimed to bring together researchers from the functional programming and the artificial intelligence communities, working in the field of inductive functional programming, and advance fruitful interactions between these communities with respect to programming techniques for inductive programming algorithms, the identification of challenge problems and potential applications. For everybody interested in inductive programming we recommend to visit the website: www.inductive-programming.org

    Protecting Systems From Exploits Using Language-Theoretic Security

    Get PDF
    Any computer program processing input from the user or network must validate the input. Input-handling vulnerabilities occur in programs when the software component responsible for filtering malicious input---the parser---does not perform validation adequately. Consequently, parsers are among the most targeted components since they defend the rest of the program from malicious input. This thesis adopts the Language-Theoretic Security (LangSec) principle to understand what tools and research are needed to prevent exploits that target parsers. LangSec proposes specifying the syntactic structure of the input format as a formal grammar. We then build a recognizer for this formal grammar to validate any input before the rest of the program acts on it. To ensure that these recognizers represent the data format, programmers often rely on parser generators or parser combinators tools to build the parsers. This thesis propels several sub-fields in LangSec by proposing new techniques to find bugs in implementations, novel categorizations of vulnerabilities, and new parsing algorithms and tools to handle practical data formats. To this end, this thesis comprises five parts that tackle various tenets of LangSec. First, I categorize various input-handling vulnerabilities and exploits using two frameworks. First, I use the mismorphisms framework to reason about vulnerabilities. This framework helps us reason about the root causes leading to various vulnerabilities. Next, we built a categorization framework using various LangSec anti-patterns, such as parser differentials and insufficient input validation. Finally, we built a catalog of more than 30 popular vulnerabilities to demonstrate the categorization frameworks. Second, I built parsers for various Internet of Things and power grid network protocols and the iccMAX file format using parser combinator libraries. The parsers I built for power grid protocols were deployed and tested on power grid substation networks as an intrusion detection tool. The parser I built for the iccMAX file format led to several corrections and modifications to the iccMAX specifications and reference implementations. Third, I present SPARTA, a novel tool I built that generates Rust code that type checks Portable Data Format (PDF) files. The type checker I helped build strictly enforces the constraints in the PDF specification to find deviations. Our checker has contributed to at least four significant clarifications and corrections to the PDF 2.0 specification and various open-source PDF tools. In addition to our checker, we also built a practical tool, PDFFixer, to dynamically patch type errors in PDF files. Fourth, I present ParseSmith, a tool to build verified parsers for real-world data formats. Most parsing tools available for data formats are insufficient to handle practical formats or have not been verified for their correctness. I built a verified parsing tool in Dafny that builds on ideas from attribute grammars, data-dependent grammars, and parsing expression grammars to tackle various constructs commonly seen in network formats. I prove that our parsers run in linear time and always terminate for well-formed grammars. Finally, I provide the earliest systematic comparison of various data description languages (DDLs) and their parser generation tools. DDLs are used to describe and parse commonly used data formats, such as image formats. Next, I conducted an expert elicitation qualitative study to derive various metrics that I use to compare the DDLs. I also systematically compare these DDLs based on sample data descriptions available with the DDLs---checking for correctness and resilience

    Optimal program variant generation for hybrid manycore systems

    Get PDF
    Field Programmable Gate Arrays promise to deliver superior energy efficiency in heterogeneous high performance computing, as compared to multicore CPUs and GPUs. The rate of adoption is however hampered by the relative difficulty of programming FPGAs. High-level synthesis tools such as Xilinx Vivado, Altera OpenCL or Intel's HLS address a large part of the programmability issue by synthesizing a Hardware Description Languages representation from a high-level specification of the application, given in programming languages such as OpenCL C, typically used to program CPUs and GPUs. Although HLS solutions make programming easier, they fail to also lighten the burden of optimization. Application developers must rely on expert knowledge to manually optimize their applications for each target device, meaning that traditional HLS solutions do not offer a solution to the issue of performance portability. This state of fact prompted the development of compiler frameworks such as TyTra that operate at an even higher level of abstraction that is amenable to the use of Design Space Exploration (DSE). With DSE the initial program specification can be seen as the starting location in a search-space of correct-by-construction program transformations. In TyTra the search-space is generated from the transitive-closure of term-level transformations derived from type-level transformations. Compiler frameworks such as TyTra theoretically solve the issue of performance portability by providing a way to automatically generate alternative correct program variants. They however suffer from the very practical issue that the generated space is often too large to fully explore. As a consequence, the globally optimal solution may be overlooked. In this work we provide a novel solution to issue performance portability by deriving an efficient yet effective DSE strategy for the TyTra compiler framework. We make use of categorical data types to derive categorical semantics for the formal languages that describe the terms, types, cost-performance estimates and their transformations. From these we define a category of interpretations for TyTra applications, from which we derive a DSE strategy that finds the globally optimal transformation sequence in polynomial time. This is achieved by reducing the size of the generated search space. We formally state and prove a theorem for this claim and then show that the polynomial run-time for our DSE strategy has practically negligible coefficients leading to sub-second exploration times for realistic applications
    corecore