343 research outputs found

    Efficient combinator parsing for natural-language.

    Get PDF

    Happy-GLL: modular, reusable and complete top-down parsers for parameterized nonterminals

    Full text link
    Parser generators and parser combinator libraries are the most popular tools for producing parsers. Parser combinators use the host language to provide reusable components in the form of higher-order functions with parsers as parameters. Very few parser generators support this kind of reuse through abstraction and even fewer generate parsers that are as modular and reusable as the parts of the grammar for which they are produced. This paper presents a strategy for generating modular, reusable and complete top-down parsers from syntax descriptions with parameterized nonterminals, based on the FUN-GLL variant of the GLL algorithm. The strategy is discussed and demonstrated as a novel back-end for the Happy parser generator. Happy grammars can contain `parameterized nonterminals' in which parameters abstract over grammar symbols, granting an abstraction mechanism to define reusable grammar operators. However, the existing Happy back-ends do not deliver on the full potential of parameterized nonterminals as parameterized nonterminals cannot be reused across grammars. Moreover, the parser generation process may fail to terminate or may result in exponentially large parsers generated in an exponential amount of time. The GLL back-end presented in this paper implements parameterized nonterminals successfully by generating higher-order functions that resemble parser combinators, inheriting all the advantages of top-down parsing. The back-end is capable of generating parsers for the full class of context-free grammars, generates parsers in linear time and generates parsers that find all derivations of the input string. To our knowledge, the presented GLL back-end makes Happy the first parser generator that combines all these features. This paper describes the translation procedure of the GLL back-end and compares it to the LALR and GLR back-ends of Happy in several experiments.Comment: 15 page

    Formal Languages and Compilation

    Full text link

    Parsing for agile modeling

    Get PDF
    Agile modeling refers to a set of methods that allow for a quick initial development of an importer and its further refinement. These requirements are not met simultaneously by the current parsing technology. Problems with parsing became a bottleneck in our research of agile modeling. In this thesis we introduce a novel approach to specify and build parsers. Our approach allows for expressive, tolerant and composable parsers without sacrificing performance. The approach is based on a context-sensitive extension of parsing expression grammars that allows a grammar engineer to specify complex language restrictions. To insure high parsing performance we automatically analyze a grammar definition and choose different parsing strategies for different parts of the grammar. We show that context-sensitive parsing expression grammars allow for highly composable, tolerant and variable-grained parsers that can be easily refined. Different parsing strategies significantly insure high-performance of parsers without sacrificing expressiveness of the underlying grammars

    Executable Attribute Grammars for Modular and Efficient Natural Language Processing

    Get PDF
    Language-processors that are constructed using top-down recursive-descent with backtracking parsing are highly modular, and are easy to implement and maintain. However, a widely-held inaccurate view is that top-down processors are inherently exponential for ambiguous grammars and cannot accommodate left-recursive syntax rules. It has been known that exponential time and space complexities can be avoided by memoization and compact graph-structured representation, and that left- recursive productions can be accommodated through a variety of techniques. However, until now, memoization, compact representation, and techniques for handling left-recursion have either been presented independently, or else attempts at their integration have compromised modularity and correctness of the resulting parses. Specifying syntax and semantics to describe formal languages using denotational notation of attribute grammars (AGs) has been widely practiced. However, very little work has shown the usefulness of declarative AGs for constructing computational models of natural language. Previous top-down approaches fall short in accommodating ambiguous and general CFGs with arbitrary semantics in one pass as executable specifications. Existing approaches lack in providing a declarative syntax-semantics interface that can take full advantages of dependencies between attributes of syntactic constituents to model linguistically-motivated cases. This thesis solves these shortcomings by proposing a new modular top-down syntactic and semantic analysis system, which is efficient and accommodates all forms of CFGs. Moreover, this system provides notation to declaratively specify semantics by establishing arbitrary dependencies between attributes of syntactic categories to perform linguistically-motivated tasks such as: building directly-executable natural-language query processors, computing meanings of sentences using compositional semantics, performing contextual disambiguation tasks, modelling restrictive classes of languages etc

    Expressing disambiguation filters as combinators

    Get PDF
    Contrarily to most conventional programming languages where certain symbols are used so as to create non-ambiguous grammars, most recent programming languages allow ambiguity. These ambiguities are solved using disambiguation rules, which dictate how the software that parses these languages should behave when faced with ambiguities. Such rules are highly efficient but come with some limitations - they cannot be further modified, their behaviour is hidden, and changing them implies re-building a parser. We propose a different approach for disambiguation. A set of disambiguation filters (expressed as combinators) are provided, and disambiguation can be achieved by composing combinators. New combinators can be created and, by having the disambiguation step separated from the parsing step, disambiguation rules can be changed without modifying the parser.- (undefined

    Security Applications of Formal Language Theory

    Get PDF
    We present an approach to improving the security of complex, composed systems based on formal language theory, and show how this approach leads to advances in input validation, security modeling, attack surface reduction, and ultimately, software design and programming methodology. We cite examples based on real-world security flaws in common protocols representing different classes of protocol complexity. We also introduce a formalization of an exploit development technique, the parse tree differential attack, made possible by our conception of the role of formal grammars in security. These insights make possible future advances in software auditing techniques applicable to static and dynamic binary analysis, fuzzing, and general reverse-engineering and exploit development. Our work provides a foundation for verifying critical implementation components with considerably less burden to developers than is offered by the current state of the art. It additionally offers a rich basis for further exploration in the areas of offensive analysis and, conversely, automated defense tools and techniques. This report is divided into two parts. In Part I we address the formalisms and their applications; in Part II we discuss the general implications and recommendations for protocol and software design that follow from our formal analysis
    corecore