584 research outputs found

    Using rewriting techniques to produce code generators and proving them correct

    Get PDF
    AbstractA major problem in deriving a compiler from a formal definition is the production of correct and efficient object code. In this context, we propose a solution to the problem of code-generator generation.Our approach is based on a target machine description where the basic concepts used (storage classes, access modes, access classes and instructions) are hierarchically described by tree patterns. These tree patterns are terms of an abstract data type. The program intermediate representation (input to the code generator) is a term of the same abstract data type.The code generation process is based on access modes and instruction template-driven rewritings. The result is that each program instruction is reduced to a sequence of elementary machine instructions, each of them representing an instance of an instruction template.The axioms of the abstract data type are used to prove that the rewritings preserve the semantics of the intermediate representation

    Determining the Limits of Automated Program Recognition

    Get PDF
    This working paper was submitted as a Ph.D. thesis proposal.Program recognition is a program understanding technique in which stereotypic computational structures are identified in a program. From this identification and the known relationships between the structures, a hierarchical description of the program's design is recovered. The feasibility of this technique for small programs has been shown by several researchers. However, it seems unlikely that the existing program recognition systems will scale up to realistic, full-sized programs without some guidance (e.g., from a person using the recognition system as an assistant). One reason is that there are limits to what can be recovered by a purely code-driven approach. Some of the information about the program that is useful to know for common software engineering tasks, particularly maintenance, is missing from the code. Another reason guidance must be provided is to reduce the cost of recognition. To determine what guidance is appropriate, therefore, we must know what information is recoverable from the code and where the complexity of program recognition lies. I propose to study the limits of program recognition, both empirically and analytically. First, I will build an experimental system that performs recognition on realistic programs on the order of thousands of lines. This will allow me to characterize the information that can be recovered by this code-driven technique. Second, I will formally analyze the complexity of the recognition process. This will help determine how guidance can be applied most profitably to improve the efficiency of program recognition.MIT Artificial Intelligence Laborator

    Toward an isomorphic diagram of the Backus-Naur form.

    Get PDF
    Computer scientists studying formal languages have made use of a variety of representations to both reason, and communicate their ideas to others. Symbolic representations have proved useful for rigorously defining the theoretical objects of the preceding topics; however, research shows that diagrammatic representations are as fundamental to these subjects. Previous research in this domain has typically been interested in studying the semantics that a particular representation is intended to capture. By contrast, this treatise considers the importance of the format of the representations themselves, and how format influences the ability of a person to uncover characteristics, relevant to the problem domain. More specifically, this thesis investigates the established formalisms that have been devised to describe formal languages, and introduces a novel concept, an augmented syntax graph. This graph, an isomorphism of the Backus-Naur form, is shown to have application in visualizing properties that are pertinent to some parsing algorithms

    The Converge programming language.

    Get PDF
    This paper details the Converge programming language, a new dynamically typed imperative programming language capable of compile-time meta-programming, and with an extendable syntax. Although Converge has been designed with the aim of implementing different model transformation approaches as embedded DSL’s in mind, it is also a General Purpose Language (GPL), albeit one with unusually powerful features. The motivation for a new approach to implementing model transformation approaches is simple: existing languages, and their associated tool-chains, lead to long and costly implementation cycles for model transformation approaches. The justification for creating a new language, rather than altering an existing one, is far less obvious— it is reasonable to suggest that, given the vast number of programming languages already in existence, one of them should present itself as a likely candidate for modification. There are two reasons why a new language is necessary to meet the aims of this paper. Firstly, in order to meet its aims, Converge contains a blend of features unique amongst programming languages; some fundamental design choices have been necessary to make these features coalesce, and imposing such choices retrospectively on an existing language would almost certainly lead to untidy results and backwards compatibility issues. Secondly, my personal experience strongly suggests that the complexity of modern languages implementations (when such implementations are available) can make adding new features a significant challenge. In short, I assert that it is easier in the context of model transformations to start with a fresh canvass than to alter an existing language. This paper comes in three main parts. The first part documents the basics of the Converge language itself;. The second part details Converge’s compile-time metaprogramming and syntax extension facilities, including a section detailing suggestions for how some of Converge’s novel features could be added to similar languages. The third part of this paper explains Converge’s syntax extension facility, and documents a user extension which allows simple UML-esque modelling languages to be embedded within Converge. As well as being a practical demonstration of Converge’s features, this facility is used extensively throughout the remainder of the paper

    Programmiersprachen und Rechenkonzepte

    Get PDF
    Die GI-Fachgruppe 2.1.4 "Programmiersprachen und Rechenkonzepte" veranstaltete vom 3. bis 5. Mai 2004 im Physikzentrum Bad Honnef ihren jährlichen Workshop. Dieser Bericht enthält eine Zusammenstellung der Beiträge. Das Treffen diente wie in jedem Jahr gegenseitigem Kennenlernen, der Vertiefung gegenseitiger Kontakte, der Vorstellung neuer Arbeiten und Ergebnisse und vor allem der intensiven Diskussion. Ein breites Spektrum von Beiträgen, von theoretischen Grundlagen über Programmentwicklung, Sprachdesign, Softwaretechnik und Objektorientierung bis hin zur überraschend langen Geschichte der Rechenautomaten seit der Antike bildete ein interessantes und abwechlungsreiches Programm. Unter anderem waren imperative, funktionale und funktional-logische Sprachen, Software/Hardware-Codesign, Semantik, Web-Programmierung und Softwaretechnik, generative Programmierung, Aspekte und formale Testunterstützung Thema. Interessante Beiträge zu diesen und weiteren Themen gaben Anlaß zu Erfahrungsaustausch und Fachgesprächen auch mit den Teilnehmern des zeitgleich im Physikzentrum Bad Honnef stattfindenden Workshops "Reengineering". Allen Teilnehmern möchte ich dafür danken, daß sie mit ihren Vorträgen und konstruktiven Diskussionsbeiträgen zum Gelingen des Workshops beigetragen haben. Dank für die Vielfalt und Qualität der Beiträge gebührt den Autoren. Ein Wort des Dankes gebührt ebenso den Mitarbeitern und der Leitung des Physikzentrums Bad Honnef für die gewohnte angenehme und anregende Atmosphäre und umfassende Betreuung

    Deep learning for compilers

    Get PDF
    Constructing compilers is hard. Optimising compilers are multi-million dollar projects spanning years of development, yet remain unable to fully exploit the available performance, and are prone to bugs. The rapid transition to heterogeneous parallelism and diverse architectures has raised demand for aggressively-optimising compilers to an all time high, leaving compiler developers struggling to keep up. What is needed are better tools to simplify compiler construction. This thesis presents new techniques that dramatically lower the cost of compiler construction, while improving robustness and performance. The enabling insight for this research is the leveraging of deep learning to model the correlations between source code and program behaviour, enabling tasks which previously required significant engineering effort to be automated. This is demonstrated in three domains: First, a generative model for compiler benchmarks is developed. The model requires no prior knowledge of programming languages, yet produces output of such quality that professional software developers cannot distinguish generated from hand-written programs. The efficacy of the generator is demonstrated by supplementing the training data of predictive models for compiler optimisations. The generator yields an automatic improvement in heuristic performance, and exposes weaknesses in state-of-the- art approaches which, when corrected, yield further performance improvements. Second, a compiler fuzzer is developed which is far simpler than prior techniques. By learning a generative model rather than engineering a generator from scratch, it is implemented in 100 fewer lines of code than the state-of-the-art, yet is capable of exposing bugs which prior techniques cannot. An extensive testing campaign reveals 67 new bugs in OpenCL compilers, many of which have now been fixed. Finally, this thesis addresses the challenge of feature design. A methodology for learning compiler heuristics is presented that, in contrast to prior approaches, learns directly over the raw textual representation of programs. The approach outperforms state-of-the-art models with hand-engineered features in two challenging optimisation domains, without requiring any expert guidance. Additionally, the methodology enables models trained in one task to be adapted to perform another, permitting the novel transfer of information between optimisation problem domains. The techniques developed in these three contrasting domains demonstrate the exciting potential of deep learning to simplify and improve compiler construction. The outcomes of this thesis enable new lines of research to equip compiler developers to keep up with the rapidly evolving landscape of heterogeneous architectures
    • …
    corecore