56 research outputs found

    Context in Parsing: Techniques and Applications

    Get PDF

    Analysing the SML97 Definition: Lexicalisation

    Get PDF
    The specification of the syntax and semantics for Standard ML have been designed to support the generation of a compiler front end, but actual implementations have required significant modification to the specification. Since the specification was written there have been major advances in the development of language analysis systems that can handle general syntax specifications. We are revisiting the SML specification to consider to what extent, using modern tooling, it can be implemented exactly as originally written. In this short paper we focus on the lexical specification

    Tunnel Parsing with counted repetitions

    Get PDF
    The article describes a new and efficient algorithm for parsing, called Tunnel Parsing, that parses from left to right on the basis of a context-free grammar without left recursion and rules that recognize empty words. The algorithm is applicable mostly for domain-specific languages. In the article, particular attention is paid to the parsing of grammar element repetitions. As a result of the parsing, a statically typed concrete syntax tree is built from top to bottom, that accurately reflects the grammar. The parsing is not done through a recursion, but through an iteration. The Tunnel Parsing algorithm uses the grammars directly without a prior refactoring and is with a linear time complexity for deterministic context-free grammars

    InDubio: a combinator library to disambiguate ambiguous grammars

    Get PDF
    First Online: 29 September 2020To infer an abstract model from source code is one of the main tasks of most software quality analysis methods. Such abstract model is called Abstract Syntax Tree and the inference task is called parsing. A parser is usually generated from a grammar specification of a (programming) language and it converts source code of that language into said abstract tree representation. Then, several techniques traverse this tree to assess the quality of the code (for example by computing source code metrics), or by building new data structures (e.g, flow graphs) to perform further analysis (such as, code cloning, dead code, etc). Parsing is a well established technique. In recent years, however, modern languages are inherently ambiguous which can only be fully handled by ambiguous grammars. In this setting disambiguation rules, which are usually included as part of the grammar specification of the ambiguous language, need to be defined. This approach has a severe limitation: disambiguation rules are not first class citizens. Parser generators offer a small set of rules that can not be extended or changed. Thus, grammar writers are not able to manipulate nor define a new specific rule that the language he is considering requires. In this paper we present a tool, name InDubio, that consists of an extensible combinator library of disambiguation filters together with a generalized parser generator for ambiguous grammars. InDubio defines a set of basic disambiguation rules as abstract syntax tree filters that can be combined into more powerful rules. Moreover, the filters are independent of the parser generator and parsing technology, and consequently, they can be easily extended and manipulated. This paper presents InDubio in detail and also presents our first experimental results.- (undefined

    Expressing disambiguation filters as combinators

    Get PDF
    Contrarily to most conventional programming languages where certain symbols are used so as to create non-ambiguous grammars, most recent programming languages allow ambiguity. These ambiguities are solved using disambiguation rules, which dictate how the software that parses these languages should behave when faced with ambiguities. Such rules are highly efficient but come with some limitations - they cannot be further modified, their behaviour is hidden, and changing them implies re-building a parser. We propose a different approach for disambiguation. A set of disambiguation filters (expressed as combinators) are provided, and disambiguation can be achieved by composing combinators. New combinators can be created and, by having the disambiguation step separated from the parsing step, disambiguation rules can be changed without modifying the parser.- (undefined

    Context classification for improved semantic understanding of mathematical formulae

    Get PDF
    The correct semantic interpretation of mathematical formulae in electronic mathematical documents is an important prerequisite for advanced tasks such as search, accessibility or computational processing. Especially in advanced maths, the meaning of characters and symbols is highly domain dependent, and only limited information can be gained from considering individual formulae and their structures. Although many approaches have been proposed for semantic interpretation of mathematical formulae, most of them rely on the limited semantics from maths representation languages whereas very few use maths context as a source of information. This thesis presents a novel approach for principal extraction of semantic information of mathematical formulae from their context in documents. We utilised different supervised machine learning (SML) techniques (i.e. Linear-Chain Conditional Random Fields (CRF), Maximum Entropy (MaxEnt) and Maximum Entropy Markov Models (MEMM) combined with Rprop- and Rprop+ optimisation algorithms) to detect definitions of simple and compound mathematical expressions, thereby deriving their meaning. The learning algorithms demand annotated corpus which its development considered as one of this thesis contributions. The corpus has been developed utilising a novel approach to extract desired maths expressions and sub-formulae and manually annotated by two independent annotators employing a standard measure for inter-annotation agreement. The thesis further developed a new approach to feature representation depending on the definitions' templates that extracted from maths documents to defeat the restraint of conventional window-based features. All contributions were evaluated by various techniques including employing the common metrics recall, precision, and harmonic F-measure

    Extending the BiYacc framework with ambiguous grammars

    Get PDF
    Dissertação de mestrado em Computer ScienceContrarily to most conventional programming languages where certain symbols are used so as to create non-ambiguous grammars, most recent programming languages allow ambiguity. This results in the necessity for a generic parser that can deal with this ambiguity without loss of performance. Currently, there is a GLR parser generator written in Haskell, integrated in the BiYacc system, developed by Departamento de Informática (DI), Universidade do Minho (UM), Portugal in collaboration with the National Institute of Informatics, Japan. In this thesis, this necessity for a generic parser is attacked by developing disambiguation filters for this system which improve its performance, as well as by implementing various known optimizations to this parser generator. Finally, performance tests are used to measure the results of the developed work.Contrariamente às linguagens de programação mais convencionais em que certos símbolos eram utilizados por forma a criar gramáticas não ambíguas, as linguagens mais recentes permitem ambiguidade, que por sua vez cria a necessidade de um parser genérico que consiga lidar com esta ambiguidade sem grandes perdas de performance. Atualmente, existe um gerador de parsers GLR em Haskell integrado no sistema BiYacc, desenvolvido pelo DI, UM, Portugal, em colaboração com o National Institute of Informatics, Japão. Nesta tese, são desenvolvidos filtros de desambiguidade para este sistema que aumentam a sua performance, assim como são feitas otimizações a vários níveis e se implementa um gerador de parsers usando um algoritmo GLL, que poderá trazer várias vantagens a nível de performance comparativamente com o algoritmo GLR atualmente implementado. Finalmente, são feitos testes de performance para avaliar os resultados do trabalho desenvolvido
    corecore