1,021 research outputs found

    An Abstract Machine for Unification Grammars

    Full text link
    This work describes the design and implementation of an abstract machine, Amalia, for the linguistic formalism ALE, which is based on typed feature structures. This formalism is one of the most widely accepted in computational linguistics and has been used for designing grammars in various linguistic theories, most notably HPSG. Amalia is composed of data structures and a set of instructions, augmented by a compiler from the grammatical formalism to the abstract instructions, and a (portable) interpreter of the abstract instructions. The effect of each instruction is defined using a low-level language that can be executed on ordinary hardware. The advantages of the abstract machine approach are twofold. From a theoretical point of view, the abstract machine gives a well-defined operational semantics to the grammatical formalism. This ensures that grammars specified using our system are endowed with well defined meaning. It enables, for example, to formally verify the correctness of a compiler for HPSG, given an independent definition. From a practical point of view, Amalia is the first system that employs a direct compilation scheme for unification grammars that are based on typed feature structures. The use of amalia results in a much improved performance over existing systems. In order to test the machine on a realistic application, we have developed a small-scale, HPSG-based grammar for a fragment of the Hebrew language, using Amalia as the development platform. This is the first application of HPSG to a Semitic language.Comment: Doctoral Thesis, 96 pages, many postscript figures, uses pstricks, pst-node, psfig, fullname and a macros fil

    A Variant of Earley Parsing

    Full text link
    The Earley algorithm is a widely used parsing method in natural language processing applications. We introduce a variant of Earley parsing that is based on a ``delayed'' recognition of constituents. This allows us to start the recognition of a constituent only in cases in which all of its subconstituents have been found within the input string. This is particularly advantageous in several cases in which partial analysis of a constituent cannot be completed and in general in all cases of productions sharing some suffix of their right-hand sides (even for different left-hand side nonterminals). Although the two algorithms result in the same asymptotic time and space complexity, from a practical perspective our algorithm improves the time and space requirements of the original method, as shown by reported experimental results.Comment: 12 pages, 1 Postscript figure, uses psfig.tex and llncs.st

    Parsing Schemata

    Get PDF
    Parsing schemata provide a general framework for specication, analysis and comparison of (sequential and/or parallel) parsing algorithms. A grammar specifies implicitly what the valid parses of a sentence are; a parsing algorithm specifies explicitly how to compute these. Parsing schemata form a well-defined level of abstraction in between grammars and parsing algorithms. A parsing schema specifies the types of intermediate results that can be computed by a parser, and the rules that allow to expand a given set of such results with new results. A parsing schema does not specify the data structures, control structures, and (in case of parallel processing)\ud communication structures that are to be used by a parser.\ud Part I, Exposition, gives a general introduction to the ideas that are worked out in the following parts.\ud Part II, Foundation, unfolds a mathematical theory of parsing schemata. Different kinds of relations between parsing schemata are formally introduced and illustrated with examples drawn from the parsing literature.\ud Part III, Application, discusses a series of applications of parsing schemata.\ud - Feature percolation in unification grammar parsing can be described in an elegant, legible notation.\ud - Because of the absence of algorithmic detail, parsing schemata can be used to get a formal grip on highly complicated algorithms. We give substance to this claim by means of a thorough analysis of Left-Corner and Head-Corner chart parsing.\ud - As an example of structural similarity of parsers, despite differences in form and appearance, we show that the underlying parsing schemata of Earley's algorithm and Tomita's algorithm are virtually identical. Using this structural correspondence we can obtain a novel parallel parser by cross-fertilizing a parallel Earley parser with Tomita's graph-structured stack.\ud - Parsing schemata can be implemented straightforwardly by boolean circuits. This means that, in principle, parsing schemata can be coded directly into hardware.\ud Part IV, Perspective, discusses the prospects for natural language parsing applications and draws some conclusions. An important observation is that the theoretical and practical part of the book reinforce each other. The proposed framework is abstract enough to allow a thorough mathematical treatment and practical enough to allow rewriting a variety of real parsing algorithms (i.e. seriously proposed in the literature, not toy examples)\ud in a clear and coherent way

    Principles and Implementation of Deductive Parsing

    Get PDF
    We present a system for generating parsers based directly on the metaphor of parsing as deduction. Parsing algorithms can be represented directly as deduction systems, and a single deduction engine can interpret such deduction systems so as to implement the corresponding parser. The method generalizes easily to parsers for augmented phrase structure formalisms, such as definite-clause grammars and other logic grammar formalisms, and has been used for rapid prototyping of parsing algorithms for a variety of formalisms including variants of tree-adjoining grammars, categorial grammars, and lexicalized context-free grammars.Comment: 69 pages, includes full Prolog cod

    Recovering Grammar Relationships for the Java Language Specification

    Get PDF
    Grammar convergence is a method that helps discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent
    • …
    corecore