72 research outputs found

    Formal foundations for semi-parsing

    Get PDF

    Deriving tolerant grammars from a base-line grammar

    Get PDF
    A grammar-based approach to tool development in re- and reverse engineering promises precise structure awareness, but it is problematic in two respects. Firstly, it is a considerable up-front investment to obtain a grammar for a relevant language or cocktail of languages. Existing work on grammar recovery addresses this concern to some extent. Secondly, it is often not feasible to insist on a precise grammar, e.g., when different dialects need to be covered. This calls for tolerant grammars. In this paper, we provide a well-engineered approach to the derivation of tolerant grammars, which is based on previous work on error recovery, fuzzy parsing, and island grammars. The technology of this paper has been used in a complex Cobol restructuring project on several millions of lines of code in different Cobol dialects. Our approach is founded on an approximation relation between a tolerant grammar and a base-line grammar which serves as a point of reference. Thereby, we avoid false positives and false negatives when parsing constructs of interest in a tolerant mode. Our approach accomplishes the effective derivation of a tolerant grammar from the syntactical structure that is relevant for a certain re- or reverse engineering tool. To this end, the productions for the constructs of interest are reused from the base-line grammar together with further productions that are needed for completion

    OrdinalFix: Fixing Compilation Errors via Shortest-Path CFL Reachability

    Full text link
    The development of correct and efficient software can be hindered by compilation errors, which must be fixed to ensure the code's syntactic correctness and program language constraints. Neural network-based approaches have been used to tackle this problem, but they lack guarantees of output correctness and can require an unlimited number of modifications. Fixing compilation errors within a given number of modifications is a challenging task. We demonstrate that finding the minimum number of modifications to fix a compilation error is NP-hard. To address compilation error fixing problem, we propose OrdinalFix, a complete algorithm based on shortest-path CFL (context-free language) reachability with attribute checking that is guaranteed to output a program with the minimum number of modifications required. Specifically, OrdinalFix searches possible fixes from the smallest to the largest number of modifications. By incorporating merged attribute checking to enhance efficiency, the time complexity of OrdinalFix is acceptable for application. We evaluate OrdinalFix on two datasets and demonstrate its ability to fix compilation errors within reasonable time limit. Comparing with existing approaches, OrdinalFix achieves a success rate of 83.5%, surpassing all existing approaches (71.7%).Comment: Accepted by ASE 202

    Recovering Grammar Relationships for the Java Language Specification

    Get PDF
    Grammar convergence is a method that helps discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent

    Automatic error recovery for LR parsers in theory and practice

    Get PDF
    This thesis argues the need for good syntax error handling schemes in language translation systems such as compilers, and for the automatic incorporation of such schemes into parser-generators. Syntax errors are studied in a theoretical framework and practical methods for handling syntax errors are presented. The theoretical framework consists of a model for syntax errors based on the concept of a minimum prefix-defined error correction,a sentence obtainable from an erroneous string by performing edit operations at prefix-defined (parser defined) errors. It is shown that for an arbitrary context-free language, it is undecidable whether a better than arbitrary choice of edit operations can be made at a prefix-defined error. For common programming languages,it is shown that minimum-distance errors and prefix-defined errors do not necessarily coincide, and that there exists an infinite number of programs that differ in a single symbol only; sets of equivalent insertions are exhibited. Two methods for syntax error recovery are, presented. The methods are language independent and suitable for automatic generation. The first method consists of two stages, local repair followed if necessary by phrase-level repair. The second method consists of a single stage in which a locally minimum-distance repair is computed. Both methods are developed for use in the practical LR parser-generator yacc, requiring no additional specifications from the user. A scheme for the automatic generation of diagnostic messages in terms of the source input is presented. Performance of the methods in practice is evaluated using a formal method based on minimum-distance and prefix-defined error correction. The methods compare favourably with existing methods for error recovery

    Contributions to the Construction of Extensible Semantic Editors

    Get PDF
    This dissertation addresses the need for easier construction and extension of language tools. Specifically, the construction and extension of so-called semantic editors is considered, that is, editors providing semantic services for code comprehension and manipulation. Editors like these are typically found in state-of-the-art development environments, where they have been developed by hand. The list of programming languages available today is extensive and, with the lively creation of new programming languages and the evolution of old languages, it keeps growing. Many of these languages would benefit from proper tool support. Unfortunately, the development of a semantic editor can be a time-consuming and error-prone endeavor, and too large an effort for most language communities. Given the complex nature of programming, and the huge benefits of good tool support, this lack of tools is problematic. In this dissertation, an attempt is made at narrowing the gap between generative solutions and how state-of-the-art editors are constructed today. A generative alternative for construction of textual semantic editors is explored with focus on how to specify extensible semantic editor services. Specifically, this dissertation shows how semantic services can be specified using a semantic formalism called refer- ence attribute grammars (RAGs), and how these services can be made responsive enough for editing, and be provided also when the text in an editor is erroneous. Results presented in this dissertation have been found useful, both in industry and in academia, suggesting that the explored approach may help to reduce the effort of editor construction

    Recovering grammar relationships for the Java language specification

    Get PDF
    Grammar convergence is a method that helps in discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent
    • …
    corecore