Search CORE

1,021 research outputs found

Recommended from our members

Type-Oriented Island Parsing

Author: Silkensen Erik Joseph
Publication venue: CU Scholar
Publication date: 01/01/2012
Field of study

This thesis addresses the problem of specifying and parsing the syntax of domain-specific languages (DSLs) in a modular, user-friendly way. That is, we want to enable the design of composable DSLs that combine the natural syntax of external DSLs with the easy implementation of internal DSLs. The challenge in parsing composable DSLs is that the composition of several (individually unambiguous) languages is likely to contain ambiguities. In this thesis, we present the design of a system that uses a type-oriented variant of island parsing to efficiently parse the syntax of composable DSLs. In particular, we show that type-oriented island parsing is the first parsing algorithm that is constant time with respect to the number of DSLs imported. We also show how to use our tool to implement DSLs on top of a host language such as Typed Racket

CU Scholar Institutional Repository

An Abstract Machine for Unification Grammars

Author: Wintner Shuly
Publication venue
Publication date: 01/01/1997
Field of study

This work describes the design and implementation of an abstract machine, Amalia, for the linguistic formalism ALE, which is based on typed feature structures. This formalism is one of the most widely accepted in computational linguistics and has been used for designing grammars in various linguistic theories, most notably HPSG. Amalia is composed of data structures and a set of instructions, augmented by a compiler from the grammatical formalism to the abstract instructions, and a (portable) interpreter of the abstract instructions. The effect of each instruction is defined using a low-level language that can be executed on ordinary hardware. The advantages of the abstract machine approach are twofold. From a theoretical point of view, the abstract machine gives a well-defined operational semantics to the grammatical formalism. This ensures that grammars specified using our system are endowed with well defined meaning. It enables, for example, to formally verify the correctness of a compiler for HPSG, given an independent definition. From a practical point of view, Amalia is the first system that employs a direct compilation scheme for unification grammars that are based on typed feature structures. The use of amalia results in a much improved performance over existing systems. In order to test the machine on a realistic application, we have developed a small-scale, HPSG-based grammar for a fragment of the Hebrew language, using Amalia as the development platform. This is the first application of HPSG to a Semitic language.Comment: Doctoral Thesis, 96 pages, many postscript figures, uses pstricks, pst-node, psfig, fullname and a macros fil

arXiv.org e-Print Archive

CiteSeerX

A Variant of Earley Parsing

Author: Nederhof Mark-Jan
Satta Giorgio
Publication venue
Publication date: 01/01/1997
Field of study

The Earley algorithm is a widely used parsing method in natural language processing applications. We introduce a variant of Earley parsing that is based on a ``delayed'' recognition of constituents. This allows us to start the recognition of a constituent only in cases in which all of its subconstituents have been found within the input string. This is particularly advantageous in several cases in which partial analysis of a constituent cannot be completed and in general in all cases of productions sharing some suffix of their right-hand sides (even for different left-hand side nonterminals). Although the two algorithms result in the same asymptotic time and space complexity, from a practical perspective our algorithm improves the time and space requirements of the original method, as shown by reported experimental results.Comment: 12 pages, 1 Postscript figure, uses psfig.tex and llncs.st

arXiv.org e-Print Archive

CiteSeerX

University of Groningen Digital Archive

Archivio istituzionale della ricerca - Università di Padova

Parsing Schemata

Author: Sikkel Nicolaas
Publication venue: University of Twente
Publication date: 01/01/1993
Field of study

Parsing schemata provide a general framework for specication, analysis and comparison of (sequential and/or parallel) parsing algorithms. A grammar specifies implicitly what the valid parses of a sentence are; a parsing algorithm specifies explicitly how to compute these. Parsing schemata form a well-defined level of abstraction in between grammars and parsing algorithms. A parsing schema specifies the types of intermediate results that can be computed by a parser, and the rules that allow to expand a given set of such results with new results. A parsing schema does not specify the data structures, control structures, and (in case of parallel processing)\ud communication structures that are to be used by a parser.\ud Part I, Exposition, gives a general introduction to the ideas that are worked out in the following parts.\ud Part II, Foundation, unfolds a mathematical theory of parsing schemata. Different kinds of relations between parsing schemata are formally introduced and illustrated with examples drawn from the parsing literature.\ud Part III, Application, discusses a series of applications of parsing schemata.\ud - Feature percolation in unification grammar parsing can be described in an elegant, legible notation.\ud - Because of the absence of algorithmic detail, parsing schemata can be used to get a formal grip on highly complicated algorithms. We give substance to this claim by means of a thorough analysis of Left-Corner and Head-Corner chart parsing.\ud - As an example of structural similarity of parsers, despite differences in form and appearance, we show that the underlying parsing schemata of Earley's algorithm and Tomita's algorithm are virtually identical. Using this structural correspondence we can obtain a novel parallel parser by cross-fertilizing a parallel Earley parser with Tomita's graph-structured stack.\ud - Parsing schemata can be implemented straightforwardly by boolean circuits. This means that, in principle, parsing schemata can be coded directly into hardware.\ud Part IV, Perspective, discusses the prospects for natural language parsing applications and draws some conclusions. An important observation is that the theoretical and practical part of the book reinforce each other. The proposed framework is abstract enough to allow a thorough mathematical treatment and practical enough to allow rewriting a variety of real parsing algorithms (i.e. seriously proposed in the literature, not toy examples)\ud in a clear and coherent way

CiteSeerX

University of Twente Research Information

Principles and Implementation of Deductive Parsing

Author: Pereira Fernando C. N.
Schabes Yves
Shieber Stuart M.
Publication venue
Publication date: 01/01/1994
Field of study

We present a system for generating parsers based directly on the metaphor of parsing as deduction. Parsing algorithms can be represented directly as deduction systems, and a single deduction engine can interpret such deduction systems so as to implement the corresponding parser. The method generalizes easily to parsers for augmented phrase structure formalisms, such as definite-clause grammars and other logic grammar formalisms, and has been used for rapid prototyping of parsing algorithms for a variety of formalisms including variants of tree-adjoining grammars, categorial grammars, and lexicalized context-free grammars.Comment: 69 pages, includes full Prolog cod

arXiv.org e-Print Archive

CiteSeerX

Elsevier - Publisher Connector

Harvard University - DASH

Recovering Grammar Relationships for the Java Language Specification

Author: A. Dubey
C. A. R. Hoare
D. A. Thomas
D. Barnard
E. Bouwers
H. H. Do
M. Di Penta
R. Lämmel
Ralf Lämmel
T. Dean
Vadim Zaytsev
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/08/2010
Field of study

Grammar convergence is a method that helps discovering relationships between different grammars of the same language or different language versions. The key element of the method is the operational, transformation-based representation of those relationships. Given input grammars for convergence, they are transformed until they are structurally equal. The transformations are composed from primitive operators; properties of these operators and the composed chains provide quantitative and qualitative insight into the relationships between the grammars at hand. We describe a refined method for grammar convergence, and we use it in a major study, where we recover the relationships between all the grammars that occur in the different versions of the Java Language Specification (JLS). The relationships are represented as grammar transformation chains that capture all accidental or intended differences between the JLS grammars. This method is mechanized and driven by nominal and structural differences between pairs of grammars that are subject to asymmetric, binary convergence steps. We present the underlying operator suite for grammar transformation in detail, and we illustrate the suite with many examples of transformations on the JLS grammars. We also describe the extraction effort, which was needed to make the JLS grammars amenable to automated processing. We include substantial metadata about the convergence process for the JLS so that the effort becomes reproducible and transparent

arXiv.org e-Print Archive

CiteSeerX

Crossref

VU Research Portal

CWI's Institutional Repository

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot