Search CORE

74 research outputs found

A Three-Phase Approach to Efficiently Transform C# into KDM

Author: Frey Sören
Hasselbring Wilhelm
Wulf Christian
Publication venue
Publication date: 01/01/2012
Field of study

The Knowledge Discovery Metamodel (KDM) of the Object Management Group (OMG) is used in diverse research areas for describing software artifacts. It was recently adopted as standard ISO/IEC 19506 and its source, code, and action packages are highly suited for enabling language-independent source code analysis. However, a program needs to be transformed to KDM before corresponding source level metrics can be computed. To be of practical use, such a transformation (1) has to be resource-efficient and (2) ideally can be constructed on the basis of existing grammars to mitigate construction effort for a specific programming language. In this paper, we present such an efficient transformation for C# that is structured along three fundamental phases covering distinct sub-transformations for the types, members and methods, and statements. As our approach systematically analyzes and re-engineers existing grammars and integrates appropriate decompilers, it provides insights for fluently building those program transformations in general. Our quantitative evaluation uses three C# open source systems and an industrial software from the financial sector. It shows that our approach can be successfully applied to these systems and that the transformation can efficiently transform the programs to KDM while keeping resource demand low

MACAU: Open Access Repository of Kiel University

Combining Monitoring with Run-Time Assertion Checking

Author: Boer F.S. (Frank) de
Gouw C.P.T. (Stijn) de
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2014
Field of study

According to a study in 2002 commissioned by a US Department, software bugs annually costs the US economy an estimated

59 billion. A more recent study in 2013 by Cambridge University estimated that the global cost has risen to

312 billion globally. There exists various ways to prevent, isolate and fix software bugs, ranging from lightweight methods that are (semi)-automatic, to heavyweight methods that require significant user interaction. Our own method described in this tutorial is based on automated run-time checking of a combination of protocol- and data-oriented properties of object-oriented programs

CWI's Institutional Repository

Acta Cybernetica : Volume 13. Number 3.

Author
Publication venue
Publication date: 01/01/1998
Field of study

University of Szeged

Proceedings of the Third Symposium on Programming Languages and Software Tools : Kääriku, Estonia, August 23-24 1993

Author
Publication venue: Tartu Ülikool
Publication date: 01/01/1993
Field of study

http://www.ester.ee/record=b1064507*es

DSpace at Tartu University Library

Run-Time Assertion Checking of Data- and Protocol-Oriented Properties of Java Programs: An Industrial Case Study

Author: Boer F.S. (Frank) de
Gouw C.P.T. (Stijn) de
Johnsen E.B. (Einar Broch)
Kohn A.
Wong P.Y.H.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2014
Field of study

Run-time assertion checking is one of the useful techniques for detecting faults, and can be applied during any program execution context, including debugging, testing, and production. In general, however, it is limited to checking state-based properties. We introduce SAGA, a general framework that provides a smooth integration of the specification and the run-time checking of both data- and protocol-oriented properties of Java classes and interfaces. We evaluate SAGA, which combines several state-of-the art tools, by conducting an industrial case study from an eCommerce software company Fredhopper

CWI's Institutional Repository

Unifying parsing and reflective printing for fully disambiguated grammars

Author: Hu Zhenjiang
Ko Hsiang-Shang
Martins Pedro
Saraiva João
Zhang Yongzhe
Zhu Zirun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Language designers usually need to implement parsers and printers. Despite being two closely related programs, in practice they are often designed separately, and then need to be revised and kept consistent as the language evolves. It will be more convenient if the parser and printer can be unified and developed in a single program, with their consistency guaranteed automatically. Furthermore, in certain scenarios (like showing compiler optimisation results to the programmer), it is desirable to have a more powerful reflective printer that, when an abstract syntax tree corresponding to a piece of program text is modified, can propagate the modification to the program text while preserving layouts, comments, and syntactic sugar. To address these needs, we propose a domain-specific language BiYacc, whose programs denote both a parser and a reflective printer for a fully disambiguated context-free grammar. BiYacc is based on the theory of bidirectional transformations, which helps to guarantee by construction that the generated pairs of parsers and reflective printers are consistent. Handling grammatical ambiguity is particularly challenging: we propose an approach based on generalised parsing and disambiguation filters, which produce all the parse results and (try to) select the only correct one in the parsing direction; the filters are carefully bidirectionalised so that they also work in the printing direction and do not break the consistency between the parsers and reflective printers. We show that BiYacc is capable of facilitating many tasks such as Pombrio and Krishnamurthi's 'resugaring', simple refactoring, and language evolution.We thank the reviewers and the editor for their selflessness and effort spent on reviewing our paper, a quite long one. With their help, the readability of the paper is much improved, especially regarding how several case studies are structured, how theorems for the basic BiYacc and theorems for the extended version handling ambiguous grammars are related, and how look-alike notions are `disambiguated'. This work is partially supported by the Japan Society for the Promotion of Science (JSPS) Grant-in-Aid for Scientific Research (S) No. 17H06099; in particular, most of the second author's contributions were made when he worked at the National Institute of Informatics and funded by the Grant

Universidade do Minho: RepositoriUM

Hybrid grammars for parsing of discontinuous phrase structures and non-projective dependency structures

Author: Gebhardt Kilian
Nederhof Mark Jan
Vogler Heiko
Publication venue: 'MIT Press - Journals'
Publication date: 01/06/2017
Field of study

We explore the concept of hybrid grammars, which formalize and generalize a range of existing frameworks for dealing with discontinuous syntactic structures. Covered are both discontinuous phrase structures and non-projective dependency structures. Technically, hybrid grammars are related to synchronous grammars, where one grammar component generates linear structures and another generates hierarchical structures. By coupling lexical elements of both components together, discontinuous structures result. Several types of hybrid grammars are characterized. We also discuss grammar induction from treebanks. The main advantage over existing frameworks is the ability of hybrid grammars to separate discontinuity of the desired structures from time complexity of parsing. This permits exploration of a large variety of parsing algorithms for discontinuous structures, with different properties. This is confirmed by the reported experimental results, which show a wide variety of running time, accuracy and frequency of parse failures.Publisher PDFPeer reviewe

Directory of Open Access Journals

University of St. Andrews - Pure

St Andrews Research Repository

Algebraic Methods in Language Processing:Proceedings of the twenty-first Twente workshop on language technology

Author
Publication venue: 'University Library/University of Twente'
Publication date: 15/08/2003
Field of study

University of Twente Research Information

An illumination of the template enigma : software code generation with templates

Author: Arnoldus B.J.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2011
Field of study

Creating software is a process of refining a concept to an implementation. This process consists of several stages represented by documents, models and plans at several levels of abstraction. Mostly, the refinement process requires creativity of the programmers, but sometimes the task is boring and repetitive. This repetitive work is an indication that the program is not written at the most suitable level of abstraction. The level of abstraction offered by the used programming language might be too low to remove the recurring code. Code generators can be used to raise the level of abstraction of program specifications and to automate the repetitive work. This thesis focuses on code generators based on templates. Templates are one of the techniques to implement a code generator. Templates allow extension of the syntax of a programming language, enabling generative programming without modifying the underlying compiler. Four artifacts are involved in a template based generator: templates, input data, a template evaluator and output code. The templates we consider are a concrete (incomplete) representation of the output document, i.e. object code, that contains holes, i.e. the meta code. These holes are filled by the template evaluator using information from the input data to obtain the output code. Templates are widely used to generate HTML code in web applications. They can be used for generating all kinds of text, like e-mails or (source) code. In this thesis we limit the scope to the generation of source code. The central research question is how the quality of template based code generators can be improved. Quality, in general, is a broad notion and our scope is limited to the technical quality of templates and generated code. We focused on improving the maintainability of template based code generators and the correctness of the generated code. This is facilitated by the three main contributions provided by this thesis. First, the maintainability of template based code generators is increased by specifying the following requirement for our metalanguage. Our metalanguage should not be rich enough to allow programming in templates, without being too restrictive to express some code generators. We used the theory of formal languages to specify our metalanguage. Second, we ensure correctness of the templates and generated code. Third, the presented theory and techniques are validated by case studies. These case studies show application of templates in real world applications, increased maintainability and syntactical correctness of generated code. Our metalanguage should not be rich enough to allow programming in templates, without being too restrictive to express some code generators. The theory of formal languages is used to specify the requirements for our metalanguage. As we only consider to generate programming languages, it is sufficient to support the generation of languages defined by context-free grammars. This assumption is used to derive a metalanguage, that is rich enough to specify code generators that are able to instantiate all possible sentences of a context-free language. A specific case of a code generator, the unparser, is a program that can instantiate all sentences of a context-free language. We proved that an unparser can be implemented using a linear deterministic topdown tree-to-string transducer. We call this property unparser-completeness. Our metalanguage is based on a linear deterministic top-down tree-to-string transducer. Recall that the goal of specifying the requirements of the metalanguage is to increase the maintainability of template based code generators, without being too restrictive. To validate that our metalanguage is not too restrictive and leads to better maintainable templates, we compared it with four off-the-shelf text template systems by implementing an unparser. We have observed that the industrial template evaluators provide a Turing complete metalanguage, but they do not contain a block scoping mechanism for the meta-variables. This results in undesired additional boilerplate meta code in their templates. The second contribution is guaranteeing the correctness of the generated code. Correctness of the generated code can be divided in two concerns: syntactical correctness and semantical correctness. We start with syntactical correctness of the generated code. The use of text templates implies that syntactical correctness of the generated code can only be detected at compilation time. This means that errors detected during the compilation are reported on the level of the generated code. The developer is required to trace back manually the errors to their origin in the template or input data. We believe that programs manipulating source code should not consider the object code as text to detect errors as early as possible. We present an approach where the grammars of the object language and metalanguage can be combined in a modular way. Combining both grammars allows parsing both languages simultaneously. Syntax errors in both languages of the template will be found while parsing it. Moreover, only parsing a template is not sufficient to ensure that the generated code will be free of syntax errors. The template evaluator must be equipped with a mechanism to guarantee its output will be syntactically correct. We discuss our mechanism in short. A parse tree is constructed during the parsing of the template. This tree contains subtrees for the object code and subtrees for the meta code. While evaluating the template, subtrees of the meta code are substituted by object code subtrees. The template evaluator checks whether the root nonterminal of the object code subtree is equal to the root nonterminal of the meta code subtree. When both are equal, it is allowed to substitute the meta code. When the root nonterminals are distinct an accurate error message is generated. The template evaluator terminates when all meta code subtrees are substituted. The result is a parse tree of the object language and thus syntactically correct. We call this process syntax safe code generation. In order to validate that the presented techniques increase maintainability and ensure syntactical correctness, we implemented our ideas in a syntax safe template evaluator called Repleo. Repleo has been applied in four case studies. The first case is a real world situation, where it is required to generate a three tier web application from a data model. This case showed that multiple layers of an applications defined in different programming languages can be generated from a single model. The second case and third case are used to show that our metalanguage results in a better maintainable code generator. Our metalanguage forces to use a two layer code generator with separation of concerns between the two layers, where the original implementations are less modular. The last case study shows that ensuring syntactical correctness results in the prevention of cross-site scripting attacks in dynamic generation of web pages. Recall that one of our goals was ensuring the correctness of the generated code. We also showed that is possible to check static semantic properties of templates. Static semantic checks are defined for the metalanguage, for the object language and checks for the situations where the object language is dependent on the metalanguage. We implemented a prototype of a static semantic checker for PicoJava templates using attribute grammars. The use of attribute grammars leads to re-use of the original PicoJava checker. Summarizing, in this thesis we have formulated the requirements for a metalanguage and discussed how to implement a syntax safe template evaluator. This results in better maintainable template based code generators and more reliable generated code

Repository TU/e

Pure OAI Repository

Efficient Semiring-Weighted Earley Parsing

Author: Cotterell Ryan
Eisner Jason
Opedal Andreas
Vieira Tim
Zmigrod Ran
Publication venue
Publication date: 06/07/2023
Field of study

This paper provides a reference description, in the form of a deduction system, of Earley's (1970) context-free parsing algorithm with various speed-ups. Our presentation includes a known worst-case runtime improvement from Earley's

O (N^3|G||R|)

, which is unworkable for the large grammars that arise in natural language processing, to

O (N^3|G|)

, which matches the runtime of CKY on a binarized version of the grammar

G

. Here

N

is the length of the sentence,

|R|

is the number of productions in

G

, and

|G|

is the total length of those productions. We also provide a version that achieves runtime of

O (N^3|M|)

with

|M| \leq |G|

when the grammar is represented compactly as a single finite-state automaton

M

(this is partly novel). We carefully treat the generalization to semiring-weighted deduction, preprocessing the grammar like Stolcke (1995) to eliminate deduction cycles, and further generalize Stolcke's method to compute the weights of sentence prefixes. We also provide implementation details for efficient execution, ensuring that on a preprocessed grammar, the semiring-weighted versions of our methods have the same asymptotic runtime and space requirements as the unweighted methods, including sub-cubic runtime on some grammars.Comment: Main conference long paper at ACL 202

arXiv.org e-Print Archive