72 research outputs found
Deriving tolerant grammars from a base-line grammar
A grammar-based approach to tool development in re- and reverse engineering promises precise structure awareness, but it is problematic in two respects. Firstly, it is a considerable up-front investment to obtain a grammar for a relevant language or cocktail of languages. Existing work on grammar recovery addresses this concern to some extent. Secondly, it is often not feasible to insist on a precise grammar, e.g., when different dialects need to be covered. This calls for tolerant grammars. In this paper, we provide a well-engineered approach to the derivation of tolerant grammars, which is based on previous work on error recovery, fuzzy parsing, and island grammars. The technology of this paper has been used in a complex Cobol restructuring project on several millions of lines of code in different Cobol dialects. Our approach is founded on an approximation relation between a tolerant grammar and a base-line grammar which serves as a point of reference. Thereby, we avoid false positives and false negatives when parsing constructs of interest in a tolerant mode. Our approach accomplishes the effective derivation of a tolerant grammar from the syntactical structure that is relevant for a certain re- or reverse engineering tool. To this end, the productions for the constructs of interest are reused from the base-line grammar together with further productions that are needed for completion
OrdinalFix: Fixing Compilation Errors via Shortest-Path CFL Reachability
The development of correct and efficient software can be hindered by
compilation errors, which must be fixed to ensure the code's syntactic
correctness and program language constraints. Neural network-based approaches
have been used to tackle this problem, but they lack guarantees of output
correctness and can require an unlimited number of modifications. Fixing
compilation errors within a given number of modifications is a challenging
task. We demonstrate that finding the minimum number of modifications to fix a
compilation error is NP-hard. To address compilation error fixing problem, we
propose OrdinalFix, a complete algorithm based on shortest-path CFL
(context-free language) reachability with attribute checking that is guaranteed
to output a program with the minimum number of modifications required.
Specifically, OrdinalFix searches possible fixes from the smallest to the
largest number of modifications. By incorporating merged attribute checking to
enhance efficiency, the time complexity of OrdinalFix is acceptable for
application. We evaluate OrdinalFix on two datasets and demonstrate its ability
to fix compilation errors within reasonable time limit. Comparing with existing
approaches, OrdinalFix achieves a success rate of 83.5%, surpassing all
existing approaches (71.7%).Comment: Accepted by ASE 202
Recovering Grammar Relationships for the Java Language Specification
Grammar convergence is a method that helps discovering relationships between
different grammars of the same language or different language versions. The key
element of the method is the operational, transformation-based representation
of those relationships. Given input grammars for convergence, they are
transformed until they are structurally equal. The transformations are composed
from primitive operators; properties of these operators and the composed chains
provide quantitative and qualitative insight into the relationships between the
grammars at hand. We describe a refined method for grammar convergence, and we
use it in a major study, where we recover the relationships between all the
grammars that occur in the different versions of the Java Language
Specification (JLS). The relationships are represented as grammar
transformation chains that capture all accidental or intended differences
between the JLS grammars. This method is mechanized and driven by nominal and
structural differences between pairs of grammars that are subject to
asymmetric, binary convergence steps. We present the underlying operator suite
for grammar transformation in detail, and we illustrate the suite with many
examples of transformations on the JLS grammars. We also describe the
extraction effort, which was needed to make the JLS grammars amenable to
automated processing. We include substantial metadata about the convergence
process for the JLS so that the effort becomes reproducible and transparent
Automatic error recovery for LR parsers in theory and practice
This thesis argues the need for good syntax error handling schemes in language
translation systems such as compilers, and for the automatic incorporation of such schemes
into parser-generators. Syntax errors are studied in a theoretical framework and practical
methods for handling syntax errors are presented.
The theoretical framework consists of a model for syntax errors based on the concept of
a minimum prefix-defined error correction,a sentence obtainable from an erroneous string by
performing edit operations at prefix-defined (parser defined) errors. It is shown that for an
arbitrary context-free language, it is undecidable whether a better than arbitrary choice of edit
operations can be made at a prefix-defined error. For common programming languages,it is
shown that minimum-distance errors and prefix-defined errors do not necessarily coincide,
and that there exists an infinite number of programs that differ in a single symbol only; sets
of equivalent insertions are exhibited.
Two methods for syntax error recovery are, presented. The methods are language
independent and suitable for automatic generation. The first method consists of two stages,
local repair followed if necessary by phrase-level repair. The second method consists of a
single stage in which a locally minimum-distance repair is computed. Both methods are
developed for use in the practical LR parser-generator yacc, requiring no additional
specifications from the user. A scheme for the automatic generation of diagnostic messages
in terms of the source input is presented. Performance of the methods in practice is evaluated
using a formal method based on minimum-distance and prefix-defined error correction. The
methods compare favourably with existing methods for error recovery
Contributions to the Construction of Extensible Semantic Editors
This dissertation addresses the need for easier construction and extension of language tools. Specifically, the construction and extension of so-called semantic editors is considered, that is, editors providing semantic services for code comprehension and manipulation. Editors like these are typically found in state-of-the-art development environments, where they have been developed by hand. The list of programming languages available today is extensive and, with the lively creation of new programming languages and the evolution of old languages, it keeps growing. Many of these languages would benefit from proper tool support. Unfortunately, the development of a semantic editor can be a time-consuming and error-prone endeavor, and too large an effort for most language communities. Given the complex nature of programming, and the huge benefits of good tool support, this lack of tools is problematic. In this dissertation, an attempt is made at narrowing the gap between generative solutions and how state-of-the-art editors are constructed today. A generative alternative for construction of textual semantic editors is explored with focus on how to specify extensible semantic editor services. Specifically, this dissertation shows how semantic services can be specified using a semantic formalism called refer- ence attribute grammars (RAGs), and how these services can be made responsive enough for editing, and be provided also when the text in an editor is erroneous. Results presented in this dissertation have been found useful, both in industry and in academia, suggesting that the explored approach may help to reduce the effort of editor construction
Recovering grammar relationships for the Java language specification
Grammar convergence is a method that helps in discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent
- …