Search CORE

1,261 research outputs found

Deriving tolerant grammars from a base-line grammar

Author: Klusener A.S. (Steven)
Lämmel R. (Ralf)
Publication venue: CWI
Publication date: 01/01/2003
Field of study

A grammar-based approach to tool development in re- and reverse engineering promises precise structure awareness, but it is problematic in two respects. Firstly, it is a considerable up-front investment to obtain a grammar for a relevant language or cocktail of languages. Existing work on grammar recovery addresses this concern to some extent. Secondly, it is often not feasible to insist on a precise grammar, e.g., when different dialects need to be covered. This calls for tolerant grammars. In this paper, we provide a well-engineered approach to the derivation of tolerant grammars, which is based on previous work on error recovery, fuzzy parsing, and island grammars. The technology of this paper has been used in a complex Cobol restructuring project on several millions of lines of code in different Cobol dialects. Our approach is founded on an approximation relation between a tolerant grammar and a base-line grammar which serves as a point of reference. Thereby, we avoid false positives and false negatives when parsing constructs of interest in a tolerant mode. Our approach accomplishes the effective derivation of a tolerant grammar from the syntactical structure that is relevant for a certain re- or reverse engineering tool. To this end, the productions for the constructs of interest are reused from the base-line grammar together with further productions that are needed for completion

CWI's Institutional Repository

Recovering Grammar Relationships for the Java Language Specification

Author: A. Dubey
C. A. R. Hoare
D. A. Thomas
D. Barnard
E. Bouwers
H. H. Do
M. Di Penta
R. Lämmel
Ralf Lämmel
T. Dean
Vadim Zaytsev
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/08/2010
Field of study

Grammar convergence is a method that helps discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent

arXiv.org e-Print Archive

CiteSeerX

Crossref

VU Research Portal

CWI's Institutional Repository

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

MediaWiki Grammar Recovery

Author: Zaytsev Vadim
Publication venue
Publication date: 01/01/2011
Field of study

The paper describes in detail the recovery effort of one of the official MediaWiki grammars. Over two hundred grammar transformation steps are reported and annotated, leading to delivery of a level 2 grammar, semi-automatically extracted from a community created semi-formal text using at least five different syntactic notations, several non-enforced naming conventions, multiple misspellings, obsolete parsing technology idiosyncrasies and other problems commonly encountered in grammars that were not engineered properly. Having a quality grammar will allow to test and validate it further, without alienating the community with a separately developed grammar.Comment: 47 page

arXiv.org e-Print Archive

CWI's Institutional Repository

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Providing rapid feedback in generated modular language environments

Author: Eelco Visser
Emma Nilsson-Nyman
Gamma E.
Kats L. C. L.
Kiczales G.
Krahn H.
Lavie A.
Lennart C.L. Kats
Maartje de Jonge
Tomita M.
Valkering R.
van Deursen A.
Visser E.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Toward an engineering discipline for grammarware

Author: Klint P. (Paul)
Lämmel R. (Ralf)
Verhoef C. (Chris)
Publication venue: A.C.M.
Publication date: 01/01/2005
Field of study

CWI's Institutional Repository

Bounded seas

Author: Chomsky
Dean
Frost
Grune
Koppler
Kurš
Landin
Nilsson-Nyman
Scott
Tomita
van den Brand
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

Abstract Imprecise manipulation of source code (semi-parsing) is useful for tasks such as robust parsing, error recovery, lexical analysis, and rapid development of parsers for data extraction. An island grammar precisely defines only a subset of a language syntax (islands), while the rest of the syntax (water) is defined imprecisely. Usually water is defined as the negation of islands. Albeit simple, such a definition of water is naive and impedes composition of islands. When developing an island grammar, sooner or later a language engineer has to create water tailored to each individual island. Such an approach is fragile, because water can change with any change of a grammar. It is time-consuming, because water is defined manually by an engineer and not automatically. Finally, an island surrounded by water cannot be reused because water has to be defined for every grammar individually. In this paper we propose a new technique of island parsing —- bounded seas. Bounded seas are composable, robust, reusable and easy to use because island-specific water is created automatically. Our work focuses on applications of island parsing to data extraction from source code. We have integrated bounded seas into a parser combinator framework as a demonstration of their composability and reusability

Proceedings - University of Groningen

Crossref

University of Groningen

ARTS repository - University of Groningen

Bern Open Repository and Information System (BORIS)

Dissertations of the University of Groningen

Formal foundations for semi-parsing

Author: Zaytsev V.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Crossref

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Contributions to the Construction of Extensible Semantic Editors

Author: Söderberg Emma
Publication venue
Publication date: 01/01/2012
Field of study

This dissertation addresses the need for easier construction and extension of language tools. Specifically, the construction and extension of so-called semantic editors is considered, that is, editors providing semantic services for code comprehension and manipulation. Editors like these are typically found in state-of-the-art development environments, where they have been developed by hand. The list of programming languages available today is extensive and, with the lively creation of new programming languages and the evolution of old languages, it keeps growing. Many of these languages would benefit from proper tool support. Unfortunately, the development of a semantic editor can be a time-consuming and error-prone endeavor, and too large an effort for most language communities. Given the complex nature of programming, and the huge benefits of good tool support, this lack of tools is problematic. In this dissertation, an attempt is made at narrowing the gap between generative solutions and how state-of-the-art editors are constructed today. A generative alternative for construction of textual semantic editors is explored with focus on how to specify extensible semantic editor services. Specifically, this dissertation shows how semantic services can be specified using a semantic formalism called refer- ence attribute grammars (RAGs), and how these services can be made responsive enough for editing, and be provided also when the text in an editor is erroneous. Results presented in this dissertation have been found useful, both in industry and in academia, suggesting that the explored approach may help to reduce the effort of editor construction

CiteSeerX

Lund University Publications

Providing rapid feedback in generated modular language environments

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2009
Field of study

Crossref

Recovering grammar relationships for the Java language specification

Author: Lämmel R. (Ralf)
Zaytsev V. (Vadim)
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/03/2011
Field of study

Grammar convergence is a method that helps in discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent

CWI's Institutional Repository