56 research outputs found
Extending the BiYacc framework with ambiguous grammars
Dissertação de mestrado em Computer ScienceContrarily to most conventional programming languages where certain symbols are used so
as to create non-ambiguous grammars, most recent programming languages allow ambiguity.
This results in the necessity for a generic parser that can deal with this ambiguity without
loss of performance.
Currently, there is a GLR parser generator written in Haskell, integrated in the BiYacc
system, developed by Departamento de Informática (DI), Universidade do Minho (UM), Portugal
in collaboration with the National Institute of Informatics, Japan. In this thesis, this necessity
for a generic parser is attacked by developing disambiguation filters for this system which
improve its performance, as well as by implementing various known optimizations to this
parser generator. Finally, performance tests are used to measure the results of the developed
work.Contrariamente às linguagens de programação mais convencionais em que certos símbolos
eram utilizados por forma a criar gramáticas não ambíguas, as linguagens mais recentes
permitem ambiguidade, que por sua vez cria a necessidade de um parser genérico que
consiga lidar com esta ambiguidade sem grandes perdas de performance.
Atualmente, existe um gerador de parsers GLR em Haskell integrado no sistema BiYacc,
desenvolvido pelo DI, UM, Portugal, em colaboração com o National Institute of Informatics,
Japão. Nesta tese, são desenvolvidos filtros de desambiguidade para este sistema que
aumentam a sua performance, assim como são feitas otimizações a vários níveis e se
implementa um gerador de parsers usando um algoritmo GLL, que poderá trazer várias
vantagens a nível de performance comparativamente com o algoritmo GLR atualmente
implementado. Finalmente, são feitos testes de performance para avaliar os resultados do
trabalho desenvolvido
InDubio: a combinator library to disambiguate ambiguous grammars
First Online: 29 September 2020To infer an abstract model from source code is one of the main tasks of most software quality analysis methods. Such abstract model is called Abstract Syntax Tree and the inference task is called parsing. A parser is usually generated from a grammar specification of a (programming) language and it converts source code of that language into said abstract tree representation. Then, several techniques traverse this tree to assess the quality of the code (for example by computing source code metrics), or by building new data structures (e.g, flow graphs) to perform further analysis (such as, code cloning, dead code, etc). Parsing is a well established technique. In recent years, however, modern languages are inherently ambiguous which can only be fully handled by ambiguous grammars. In this setting disambiguation rules, which are usually included as part of the grammar specification of the ambiguous language, need to be defined. This approach has a severe limitation: disambiguation rules are not first class citizens. Parser generators offer a small set of rules that can not be extended or changed. Thus, grammar writers are not able to manipulate nor define a new specific rule that the language he is considering requires. In this paper we present a tool, name InDubio, that consists of an extensible combinator library of disambiguation filters together with a generalized parser generator for ambiguous grammars. InDubio defines a set of basic disambiguation rules as abstract syntax tree filters that can be combined into more powerful rules. Moreover, the filters are independent of the parser generator and parsing technology, and consequently, they can be easily extended and manipulated. This paper presents InDubio in detail and also presents our first experimental results.- (undefined
Expressing disambiguation filters as combinators
Contrarily to most conventional programming languages where certain symbols are used so as to create non-ambiguous grammars, most recent programming languages allow ambiguity. These ambiguities are solved using disambiguation rules, which dictate how the software that parses these languages should behave when faced with ambiguities. Such rules are highly efficient but come with some limitations - they cannot be further modified, their behaviour is hidden, and changing them implies re-building a parser. We propose a different approach for disambiguation. A set of disambiguation filters (expressed as combinators) are provided, and disambiguation can be achieved by composing combinators. New combinators can be created and, by having the disambiguation step separated from the parsing step, disambiguation rules can be changed without modifying the parser.- (undefined
Towards Zero-Overhead Disambiguation of Deep Priority Conflicts
**Context** Context-free grammars are widely used for language prototyping
and implementation. They allow formalizing the syntax of domain-specific or
general-purpose programming languages concisely and declaratively. However, the
natural and concise way of writing a context-free grammar is often ambiguous.
Therefore, grammar formalisms support extensions in the form of *declarative
disambiguation rules* to specify operator precedence and associativity, solving
ambiguities that are caused by the subset of the grammar that corresponds to
expressions.
**Inquiry** Implementing support for declarative disambiguation within a
parser typically comes with one or more of the following limitations in
practice: a lack of parsing performance, or a lack of modularity (i.e.,
disallowing the composition of grammar fragments of potentially different
languages). The latter subject is generally addressed by scannerless
generalized parsers. We aim to equip scannerless generalized parsers with novel
disambiguation methods that are inherently performant, without compromising the
concerns of modularity and language composition.
**Approach** In this paper, we present a novel low-overhead implementation
technique for disambiguating deep associativity and priority conflicts in
scannerless generalized parsers with lightweight data-dependency.
**Knowledge** Ambiguities with respect to operator precedence and
associativity arise from combining the various operators of a language. While
*shallow conflicts* can be resolved efficiently by one-level tree patterns,
*deep conflicts* require more elaborate techniques, because they can occur
arbitrarily nested in a tree. Current state-of-the-art approaches to solving
deep priority conflicts come with a severe performance overhead.
**Grounding** We evaluated our new approach against state-of-the-art
declarative disambiguation mechanisms. By parsing a corpus of popular
open-source repositories written in Java and OCaml, we found that our approach
yields speedups of up to 1.73x over a grammar rewriting technique when parsing
programs with deep priority conflicts--with a modest overhead of 1-2 % when
parsing programs without deep conflicts.
**Importance** A recent empirical study shows that deep priority conflicts
are indeed wide-spread in real-world programs. The study shows that in a corpus
of popular OCaml projects on Github, up to 17 % of the source files contain
deep priority conflicts. However, there is no solution in the literature that
addresses efficient disambiguation of deep priority conflicts, with support for
modular and composable syntax definitions
Parse Forest Diagnostics with Dr. Ambiguity
In this paper we propose and evaluate a method for locating causes of ambiguity in context-free grammars by automatic analysis of parse forests. A parse forest is the set of parse trees of an ambiguous sentence.
% an output of a static ambiguity detection tool that has detected ambiguity in a context-free grammar or of a general parser that has accidentally parsed an ambiguous sentence.
Deducing causes of ambiguity from observing parse forests is hard for grammar engineers because of (a) the size of the parse forests, (b) the complex shape of parse forests, and (c) the diversity of causes of ambiguity.
We first analyze the diversity of ambiguities in grammars for programming languages and the diversity of solutions to these ambiguities. Then we introduce \drambiguity: a parse forest diagnostics tools that explains the causes of ambiguity by analyzing differences between parse trees and proposes solutions. We demonstrate its effectiveness using a small experiment with a grammar for Java 5
Ambiguity Detection: Scaling to Scannerless
Static ambiguity detection would be an important aspect of language
workbenches for textual software languages. However, the challenge is
that automatic ambiguity detection in context-free grammars is undecidable
in general. Sophisticated approximations and optimizations do exist,
but these do not scale to grammars for so-called ``scannerless parsers'', as of yet.
We extend previous work on ambiguity detection for context-free grammars to
cover disambiguation techniques that are typical for scannerless parsing,
such as longest match and reserved keywords.
This paper contributes a new algorithm for ambiguity detection in
character-level grammars, a prototype implementation of this algorithm and
validation on several real grammars. The total run-time of ambiguity
detection for character-level grammars for languages such as C and Java is
significantly reduced, without loss of precision.
The result is that efficient ambiguity detection in realistic grammars is
possible and may therefore become a tool in language workbenches
- …