314 research outputs found

    Purely functional GLL parsing

    Get PDF
    Generalised parsing has become increasingly important in the context of software language design and several compiler generators and language workbenches have adopted generalised parsing algorithms such as GLR and GLL. The original GLL parsing algorithms are described in low-level pseudo-code as the output of a parser generator. This paper explains GLL parsing differently, defining the FUN-GLL algorithm as a collection of pure, mathematical functions and focussing on the logic of the algorithm by omitting implementation details. In particular, the data structures are modelled by abstract sets and relations rather than specialised implementations. The description is further simplified by omitting lookahead and adopting the binary subtree representation of derivations to avoid the clerical overhead of graph construction. Conventional parser combinators inherit the drawbacks from the recursive descent algorithms they implement. Based on FUN-GLL, this paper defines generalised parser combinators that overcome these problems. Th

    Exploring and visualizing GLL parsing

    Get PDF

    A Reference GLL Implementation

    Get PDF
    The Generalised-LL (GLL) context-free parsing algorithmwas introduced at the 2009 LDTA workshop, and since then aseries of variant algorithms and implementations have beendescribed. There is a wide variety of optimisations that maybe applied to GLL, some of which were already present inthe originally published form.This paper presents a reference GLL implementation shornof all optimisations as a common baseline for the real-worldcomparison of performance across GLL variants. This baselineversion has particular value for non-specialists, sinceits simple form may be straightforwardly encoded in theimplementer’s preferred programming language.We also describe our approach to low level memory managementof GLL internal data structures. Our evaluation onlarge inputs shows a factor 3–4 speedup over a naïve implementationusing the standard Java APIs and a factor 4–5reduction in heap requirements. We conclude with noteson some algorithm-level optimisations that may be appliedindependently of the internal data representation

    Context-Free Path Querying with Structural Representation of Result

    Full text link
    Graph data model and graph databases are very popular in various areas such as bioinformatics, semantic web, and social networks. One specific problem in the area is a path querying with constraints formulated in terms of formal grammars. The query in this approach is written as grammar, and paths querying is graph parsing with respect to given grammar. There are several solutions to it, but how to provide structural representation of query result which is practical for answer processing and debugging is still an open problem. In this paper we propose a graph parsing technique which allows one to build such representation with respect to given grammar in polynomial time and space for arbitrary context-free grammar and graph. Proposed algorithm is based on generalized LL parsing algorithm, while previous solutions are based mostly on CYK or Earley algorithms, which reduces time complexity in some cases.Comment: Evaluation extende

    Context-Free Path Querying by Matrix Multiplication

    Full text link
    Graph data models are widely used in many areas, for example, bioinformatics, graph databases. In these areas, it is often required to process queries for large graphs. Some of the most common graph queries are navigational queries. The result of query evaluation is a set of implicit relations between nodes of the graph, i.e. paths in the graph. A natural way to specify these relations is by specifying paths using formal grammars over the alphabet of edge labels. An answer to a context-free path query in this approach is usually a set of triples (A, m, n) such that there is a path from the node m to the node n, whose labeling is derived from a non-terminal A of the given context-free grammar. This type of queries is evaluated using the relational query semantics. Another example of path query semantics is the single-path query semantics which requires presenting a single path from the node m to the node n, whose labeling is derived from a non-terminal A for all triples (A, m, n) evaluated using the relational query semantics. There is a number of algorithms for query evaluation which use these semantics but all of them perform poorly on large graphs. One of the most common technique for efficient big data processing is the use of a graphics processing unit (GPU) to perform computations, but these algorithms do not allow to use this technique efficiently. In this paper, we show how the context-free path query evaluation using these query semantics can be reduced to the calculation of the matrix transitive closure. Also, we propose an algorithm for context-free path query evaluation which uses relational query semantics and is based on matrix operations that make it possible to speed up computations by using a GPU.Comment: 9 pages, 11 figures, 2 table

    On the Relation between Context-Free Grammars and Parsing Expression Grammars

    Full text link
    Context-Free Grammars (CFGs) and Parsing Expression Grammars (PEGs) have several similarities and a few differences in both their syntax and semantics, but they are usually presented through formalisms that hinder a proper comparison. In this paper we present a new formalism for CFGs that highlights the similarities and differences between them. The new formalism borrows from PEGs the use of parsing expressions and the recognition-based semantics. We show how one way of removing non-determinism from this formalism yields a formalism with the semantics of PEGs. We also prove, based on these new formalisms, how LL(1) grammars define the same language whether interpreted as CFGs or as PEGs, and also show how strong-LL(k), right-linear, and LL-regular grammars have simple language-preserving translations from CFGs to PEGs

    GLL parsing with flexible combinators

    Get PDF
    At SLE in 2014, Ridge presented the P3 combinator library with which parsers can be developed for left-recursive, non-deterministic and ambiguous grammars. A combinator expression in P3 yields a binarised grammar reflecting the expression's structure. The grammar is given to an underlying, generalised parsing procedure computing all derivations. In this paper we present a combinator library with a similar architecture to P3, adjusting it to avoid grammar binarisation. Avoiding binarisation has a significant positive effect on the running times of the underlying parsing procedure, which we demonstrate using real-world grammars. Binarisation is avoided by restricting the applicability of combinators, resulting in combinator expressions closely resembling BNF fragments. Usability is recovered by defining coercions that automatically convert expressions where necessary. As the underlying parsing procedure, we use a purely functional variant of generalised top-down (GLL) parsing

    Happy-GLL: modular, reusable and complete top-down parsers for parameterized nonterminals

    Full text link
    Parser generators and parser combinator libraries are the most popular tools for producing parsers. Parser combinators use the host language to provide reusable components in the form of higher-order functions with parsers as parameters. Very few parser generators support this kind of reuse through abstraction and even fewer generate parsers that are as modular and reusable as the parts of the grammar for which they are produced. This paper presents a strategy for generating modular, reusable and complete top-down parsers from syntax descriptions with parameterized nonterminals, based on the FUN-GLL variant of the GLL algorithm. The strategy is discussed and demonstrated as a novel back-end for the Happy parser generator. Happy grammars can contain `parameterized nonterminals' in which parameters abstract over grammar symbols, granting an abstraction mechanism to define reusable grammar operators. However, the existing Happy back-ends do not deliver on the full potential of parameterized nonterminals as parameterized nonterminals cannot be reused across grammars. Moreover, the parser generation process may fail to terminate or may result in exponentially large parsers generated in an exponential amount of time. The GLL back-end presented in this paper implements parameterized nonterminals successfully by generating higher-order functions that resemble parser combinators, inheriting all the advantages of top-down parsing. The back-end is capable of generating parsers for the full class of context-free grammars, generates parsers in linear time and generates parsers that find all derivations of the input string. To our knowledge, the presented GLL back-end makes Happy the first parser generator that combines all these features. This paper describes the translation procedure of the GLL back-end and compares it to the LALR and GLR back-ends of Happy in several experiments.Comment: 15 page
    • …
    corecore