56 research outputs found
Analysing the SML97 Definition: Lexicalisation
The specification of the syntax and semantics for Standard ML have been designed to support the generation of a compiler front end, but actual implementations have required significant modification to the specification. Since the specification was written there have been major advances in the development of language analysis systems that can handle general syntax specifications. We are revisiting the SML specification to consider to what extent, using modern tooling, it can be implemented exactly as originally written. In this short paper we focus on the lexical specification
Tunnel Parsing with counted repetitions
The article describes a new and efficient algorithm for parsing, called Tunnel Parsing, that parses from left to right on the basis of a context-free grammar without left recursion and rules that recognize empty words. The algorithm is applicable mostly for domain-specific languages. In the article, particular attention is paid to the parsing of grammar element repetitions. As a result of the parsing, a statically typed concrete syntax tree is built from top to bottom, that accurately reflects the grammar. The parsing is not done through a recursion, but through an iteration. The Tunnel Parsing algorithm uses the grammars directly without a prior refactoring and is with a linear time complexity for deterministic context-free grammars
InDubio: a combinator library to disambiguate ambiguous grammars
First Online: 29 September 2020To infer an abstract model from source code is one of the main tasks of most software quality analysis methods. Such abstract model is called Abstract Syntax Tree and the inference task is called parsing. A parser is usually generated from a grammar specification of a (programming) language and it converts source code of that language into said abstract tree representation. Then, several techniques traverse this tree to assess the quality of the code (for example by computing source code metrics), or by building new data structures (e.g, flow graphs) to perform further analysis (such as, code cloning, dead code, etc). Parsing is a well established technique. In recent years, however, modern languages are inherently ambiguous which can only be fully handled by ambiguous grammars. In this setting disambiguation rules, which are usually included as part of the grammar specification of the ambiguous language, need to be defined. This approach has a severe limitation: disambiguation rules are not first class citizens. Parser generators offer a small set of rules that can not be extended or changed. Thus, grammar writers are not able to manipulate nor define a new specific rule that the language he is considering requires. In this paper we present a tool, name InDubio, that consists of an extensible combinator library of disambiguation filters together with a generalized parser generator for ambiguous grammars. InDubio defines a set of basic disambiguation rules as abstract syntax tree filters that can be combined into more powerful rules. Moreover, the filters are independent of the parser generator and parsing technology, and consequently, they can be easily extended and manipulated. This paper presents InDubio in detail and also presents our first experimental results.- (undefined
Expressing disambiguation filters as combinators
Contrarily to most conventional programming languages where certain symbols are used so as to create non-ambiguous grammars, most recent programming languages allow ambiguity. These ambiguities are solved using disambiguation rules, which dictate how the software that parses these languages should behave when faced with ambiguities. Such rules are highly efficient but come with some limitations - they cannot be further modified, their behaviour is hidden, and changing them implies re-building a parser. We propose a different approach for disambiguation. A set of disambiguation filters (expressed as combinators) are provided, and disambiguation can be achieved by composing combinators. New combinators can be created and, by having the disambiguation step separated from the parsing step, disambiguation rules can be changed without modifying the parser.- (undefined
Context classification for improved semantic understanding of mathematical formulae
The correct semantic interpretation of mathematical formulae in electronic mathematical documents is an important prerequisite for advanced tasks such as search, accessibility or computational processing. Especially in advanced maths, the meaning of characters and symbols is highly domain dependent, and only limited information can be gained from considering individual formulae and their structures. Although many approaches have been proposed for semantic interpretation of mathematical formulae, most of them rely on the limited semantics from maths representation languages whereas very few use maths context as a source of information. This thesis presents a novel approach for principal extraction of semantic information of mathematical formulae from their context in documents. We utilised different supervised machine learning (SML) techniques (i.e. Linear-Chain Conditional Random Fields (CRF), Maximum Entropy (MaxEnt) and Maximum Entropy Markov Models (MEMM) combined with Rprop- and Rprop+ optimisation algorithms) to detect definitions of simple and compound mathematical expressions, thereby deriving their meaning. The learning algorithms demand annotated corpus which its development considered as one of this thesis contributions. The corpus has been developed utilising a novel approach to extract desired maths expressions and sub-formulae and manually annotated by two independent annotators employing a standard measure for inter-annotation agreement. The thesis further developed a new approach to feature representation depending on the definitions' templates that extracted from maths documents to defeat the restraint of conventional window-based features. All contributions were evaluated by various techniques including employing the common metrics recall, precision, and harmonic F-measure
Extending the BiYacc framework with ambiguous grammars
Dissertação de mestrado em Computer ScienceContrarily to most conventional programming languages where certain symbols are used so
as to create non-ambiguous grammars, most recent programming languages allow ambiguity.
This results in the necessity for a generic parser that can deal with this ambiguity without
loss of performance.
Currently, there is a GLR parser generator written in Haskell, integrated in the BiYacc
system, developed by Departamento de Informática (DI), Universidade do Minho (UM), Portugal
in collaboration with the National Institute of Informatics, Japan. In this thesis, this necessity
for a generic parser is attacked by developing disambiguation filters for this system which
improve its performance, as well as by implementing various known optimizations to this
parser generator. Finally, performance tests are used to measure the results of the developed
work.Contrariamente às linguagens de programação mais convencionais em que certos símbolos
eram utilizados por forma a criar gramáticas não ambíguas, as linguagens mais recentes
permitem ambiguidade, que por sua vez cria a necessidade de um parser genérico que
consiga lidar com esta ambiguidade sem grandes perdas de performance.
Atualmente, existe um gerador de parsers GLR em Haskell integrado no sistema BiYacc,
desenvolvido pelo DI, UM, Portugal, em colaboração com o National Institute of Informatics,
Japão. Nesta tese, são desenvolvidos filtros de desambiguidade para este sistema que
aumentam a sua performance, assim como são feitas otimizações a vários níveis e se
implementa um gerador de parsers usando um algoritmo GLL, que poderá trazer várias
vantagens a nível de performance comparativamente com o algoritmo GLR atualmente
implementado. Finalmente, são feitos testes de performance para avaliar os resultados do
trabalho desenvolvido
- …