Search CORE

30 research outputs found

Recovering Grammar Relationships for the Java Language Specification

Author: A. Dubey
C. A. R. Hoare
D. A. Thomas
D. Barnard
E. Bouwers
H. H. Do
M. Di Penta
R. Lämmel
Ralf Lämmel
T. Dean
Vadim Zaytsev
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/08/2010
Field of study

Grammar convergence is a method that helps discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent

arXiv.org e-Print Archive

CiteSeerX

Crossref

VU Research Portal

CWI's Institutional Repository

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Concrete Syntax with Black Box Parsers

Author: Aarssen Rodin
van der Storm Tijs
Vinju Jurgen
Publication venue: 'Aspect-Oriented Software Association (AOSA)'
Publication date: 01/02/2019
Field of study

Context: Meta programming consists for a large part of matching, analyzing, and transforming syntax trees. Many meta programming systems process abstract syntax trees, but this requires intimate knowledge of the structure of the data type describing the abstract syntax. As a result, meta programming is error-prone, and meta programs are not resilient to evolution of the structure of such ASTs, requiring invasive, fault-prone change to these programs. Inquiry: Concrete syntax patterns alleviate this problem by allowing the meta programmer to match and create syntax trees using the actual syntax of the object language. Systems supporting concrete syntax patterns, however, require a concrete grammar of the object language in their own formalism. Creating such grammars is a costly and error-prone process, especially for realistic languages such as Java and C++. Approach: In this paper we present Concretely, a technique to extend meta programming systems with pluggable concrete syntax patterns, based on external, black box parsers. We illustrate Concretely in the context of Rascal, an open-source meta programming system and language workbench, and show how to reuse existing parsers for Java, JavaScript, and C++. Furthermore, we propose Tympanic, a DSL to declaratively map external AST structures to Rascal's internal data structures. Tympanic allows implementors of Concretely to solve the impedance mismatch between object-oriented class hierarchies in Java and Rascal's algebraic data types. Both the algebraic data type and AST marshalling code is automatically generated. Knowledge: The conceptual architecture of Concretely and Tympanic supports the reuse of pre-existing, external parsers, and their AST representation in meta programming systems that feature concrete syntax patterns for matching and constructing syntax trees. As such this opens up concrete syntax pattern matching for a host of realistic languages for which writing a grammar from scratch is time consuming and error-prone, but for which industry-strength parsers exist in the wild. Grounding: We evaluate Concretely in terms of source lines of code (SLOC), relative to the size of the AST data type and marshalling code. We show that for real programming languages such as C++ and Java, adding support for concrete syntax patterns takes an effort only in the order of dozens of SLOC. Similarly, we evaluate Tympanic in terms of SLOC, showing an order of magnitude of reduction in SLOC compared to manual implementation of the AST data types and marshalling code. Importance: Meta programming has applications in reverse engineering, reengineering, source code analysis, static analysis, software renovation, domain-specific language engineering, and many others. Processing of syntax trees is central to all of these tasks. Concrete syntax patterns improve the practice of constructing meta programs. The combination of Concretely and Tympanic has the potential to make concrete syntax patterns available with very little effort, thereby improving and promoting the application of meta programming in the general software engineering context

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

CWI's Institutional Repository

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Automatic repair and type binding of undeclared variables using neural networks

Author: Theru Mohan Venkatesh
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2019
Field of study

Over the past few years, there had been significant achievements in the deployment of deep learning for analysing the programs due to the brilliance of encoding the programs by building vector representations. Deep learning had been used in program analysis for detection of security vulnerabilities using generative adversarial networks, prediction of hidden software defects using software defect datasets. Furthermore, they had also been used for detecting as well as fixing syntax errors that are made by novice programmers by learning a trained neural machine translation on bug-free programming source codes to suggest possible fixes by finding the location of the tokens that are not in place. However, all these approaches either require defect datasets or bug-free code samples that are executable for training the deep learning model. Our neural network model is neither trained with any defect datasets nor bug-free code samples, instead it is trained using structural semantic details of ASTs where each node represents a construct appearing in the source code. This model is implemented to fix one of the most common syntax errors, such as undeclared variable errors as well as infer their type information before program compilation. By this approach, the model has achieved in correctly locating and identifying 81% of the programs on prutor dataset of 1059 programs with undeclared variable errors and also inferring their data types correctly in 80% of the programs

Digital Repository @ Iowa State University (ISU)

Recovering grammar relationships for the Java language specification

Author: Lämmel R. (Ralf)
Zaytsev V. (Vadim)
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/03/2011
Field of study

Grammar convergence is a method that helps in discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent

CWI's Institutional Repository

Generating rewritable abstract syntax trees - A foundation for the rapid development of source code transformation tools

Author: Jeffrey L Overbey
Ralph E Johnson
Publication venue
Publication date: 01/01/2008
Field of study

Abstract. Building a production-quality refactoring engine or similar source code transformation tool traditionally requires a large amount of hand-written, language-specific support code. We describe a system which reduces this overhead by allowing both a parser and a fully rewritable AST to be generated automatically from an annotated grammar, requiring little or no additional handwritten code. The rewritable AST is ideal for implementing program transformations that preserve the formatting of the original sources, including spacing and comments, and the system can be augmented to allow transformation of Cpreprocessed sources even when the target language is not C or C++. Moreover, the AST design is fully customizable, allowing it to resemble a hand-coded tree. The amount of required annotation is typically quite small, and the annotated grammar is often an order of magnitude smaller than the generated code

CiteSeerX

RAVEN: a Node.js Static Metadata Extracting Solution for JavaScript Applications

Author: Carlos Maria Antunes Matias
Publication venue
Publication date: 14/07/2016
Field of study

Metadados são um tipo de dados que se encontram em qualquer tipo de recurso digital e que fornecem informações pertinentes sobre estes, como data de criação e autor. Ao analisar estáticamente ficheiros de código, é possível extrair metadados adicionais, para além daqueles já existentes, possibilitando uma melhor compreensão sobre os recursos analisados e a comparação com outros da mesma espécie.Análisar estaticamente um ficheiro de código consiste em examiná-lo sem ter que o executar. Esta análise permite obter uma representação do código, a qual pode ser utilizada para obter mais metdadados, na forma de métricas de software. Métricas de software são o resultado de medições efetuadas sobre software, sendo exemplos o número de linhas de código de um ficheiro ou a sua complexidade/qualidade.O objetivo desta dissertação prende-se com a criação de uma solução de extração de metadados de aplicações JavaScript, através do uso da plataforma Node.js. Esta solução procederá à análise estática de código JavaScript, existindo várias abordagens possíveis. Desta análise surgem um conjunto de métricas de software, que, em conjunto com outros dados como frameworks em uso, permitem obter uma ferramenta que traduz valor para o proponente da dissertação e comparar ficheiros quanto à sua complexidade.Metadata provide useful information about any type of digital resource. Examples of metadata are author and date of creation of a file. By extracting additional metadata from source code files, through static analysis, one can collect additional information, besides the already existent, and gather a better understanding of the resources and compare them with similar ones.Static analysis consists in examining code files without the need of executing them. This type of analysis allows the creation of a representation of the code which can be used for obtaining more metadata, in the form of software metrics. Software metrics are the result of measurements made over software. Lines of code and code complexity are examples of software metrics.The aim of this dissertation is to develop a metadata extraction solution for JavaScript applications, by leveraging the Node.js environment. This solution will statically analyze JavaScript code, where distinct approaches are possible. The analysis results in a group of software metrics that, in conjunction with other data such as frameworks in use, will produce a valuable tool for the proponent of this dissertation and allow the comparison of files regarding their complexity/quality

Repositório Aberto da Universidade do Porto