20 research outputs found

    Improved Left-Corner Chart Parsing for Large Context-Free Grammars

    Full text link
    We develop an improved form of left-corner chart parsing for large context-free grammars, introducing improvements that result in signicant speed-ups compared to previously-known variants of left-corner parsing. We also compare our method to several other major parsing approaches, and nd that our improved left-corner parsing method outperforms each of these across a range of grammars. Finally, we also describe a new technique for minimizing the extra information needed to eÆciently recover parses from the data structures built in the course of parsing.

    Parsing Schemata

    Get PDF
    Parsing schemata provide a general framework for specication, analysis and comparison of (sequential and/or parallel) parsing algorithms. A grammar specifies implicitly what the valid parses of a sentence are; a parsing algorithm specifies explicitly how to compute these. Parsing schemata form a well-defined level of abstraction in between grammars and parsing algorithms. A parsing schema specifies the types of intermediate results that can be computed by a parser, and the rules that allow to expand a given set of such results with new results. A parsing schema does not specify the data structures, control structures, and (in case of parallel processing)\ud communication structures that are to be used by a parser.\ud Part I, Exposition, gives a general introduction to the ideas that are worked out in the following parts.\ud Part II, Foundation, unfolds a mathematical theory of parsing schemata. Different kinds of relations between parsing schemata are formally introduced and illustrated with examples drawn from the parsing literature.\ud Part III, Application, discusses a series of applications of parsing schemata.\ud - Feature percolation in unification grammar parsing can be described in an elegant, legible notation.\ud - Because of the absence of algorithmic detail, parsing schemata can be used to get a formal grip on highly complicated algorithms. We give substance to this claim by means of a thorough analysis of Left-Corner and Head-Corner chart parsing.\ud - As an example of structural similarity of parsers, despite differences in form and appearance, we show that the underlying parsing schemata of Earley's algorithm and Tomita's algorithm are virtually identical. Using this structural correspondence we can obtain a novel parallel parser by cross-fertilizing a parallel Earley parser with Tomita's graph-structured stack.\ud - Parsing schemata can be implemented straightforwardly by boolean circuits. This means that, in principle, parsing schemata can be coded directly into hardware.\ud Part IV, Perspective, discusses the prospects for natural language parsing applications and draws some conclusions. An important observation is that the theoretical and practical part of the book reinforce each other. The proposed framework is abstract enough to allow a thorough mathematical treatment and practical enough to allow rewriting a variety of real parsing algorithms (i.e. seriously proposed in the literature, not toy examples)\ud in a clear and coherent way

    Computer-Assisted Language Learning and the Revolution in Computational Linguistics

    Get PDF
    For a long period, Computational Linguistics (CL) and Computer-Assisted Language Learning (CALL) have developed almost entirely independently of each other. A brief historical survey shows that the main reason for this state of affairs was the long preoccupation in CL with the general problem of Natural Language Understanding (NLU). As a consequence, much effort was directed to fields such as Machine Translation (MT), which were perceived as incorporating and testing NLU. CALL does not fit this model very well so that it was hardly considered worth pursuing in CL. In the 1990s the realization that products could not live up to expectations, even in the domain of MT, led to a crisis. After this crisis the dominant approach to CL has become much more problem-oriented. From this perspective, many of the earlier differences disadvantaging CALL with respect to MT have now disappeared. Therefore the revolution in CL offers promising perspectives for CALL

    Automatic error recovery for LR parsers in theory and practice

    Get PDF
    This thesis argues the need for good syntax error handling schemes in language translation systems such as compilers, and for the automatic incorporation of such schemes into parser-generators. Syntax errors are studied in a theoretical framework and practical methods for handling syntax errors are presented. The theoretical framework consists of a model for syntax errors based on the concept of a minimum prefix-defined error correction,a sentence obtainable from an erroneous string by performing edit operations at prefix-defined (parser defined) errors. It is shown that for an arbitrary context-free language, it is undecidable whether a better than arbitrary choice of edit operations can be made at a prefix-defined error. For common programming languages,it is shown that minimum-distance errors and prefix-defined errors do not necessarily coincide, and that there exists an infinite number of programs that differ in a single symbol only; sets of equivalent insertions are exhibited. Two methods for syntax error recovery are, presented. The methods are language independent and suitable for automatic generation. The first method consists of two stages, local repair followed if necessary by phrase-level repair. The second method consists of a single stage in which a locally minimum-distance repair is computed. Both methods are developed for use in the practical LR parser-generator yacc, requiring no additional specifications from the user. A scheme for the automatic generation of diagnostic messages in terms of the source input is presented. Performance of the methods in practice is evaluated using a formal method based on minimum-distance and prefix-defined error correction. The methods compare favourably with existing methods for error recovery

    Application of stochastic grammars to understanding action

    Get PDF
    Thesis (M.S.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1998.Includes bibliographical references (leaves 69-72).by Yuri A. Ivanov.M.S

    Computing order-independent statistical characteristics of stochastic context-free languages

    Get PDF

    Parallel Parsing of Context-Free Languages on an Array of Processors

    Get PDF
    Kosaraju [Kosaraju 69] and independently ten years later, Guibas, Kung and Thompson [Guibas 79] devised an algorithm (K-GKT) for solving on an array of processors a class of dynamic programming problems of which general context-free language (CFL) recognition is a member. I introduce an extension to K-GKT which allows parsing as well as recognition. The basic idea of the extension is to add counters to the processors. These act as pointers to other processors. The extended algorithm consists of three phases which I call the recognition phase, the marking phase and the parse output phase. I first consider the case of unambiguous grammars. I show that in that case, the algorithm has O(n2log n) space complexity and a linear time complexity. To obtain these results I rely on a counter implementation that allows the execution in constant time of each of the operations: set to zero, test if zero, increment by 1 and decrement by 1. I provide a proof of correctness of this implementation. I introduce the concept of efficient grammars. One factor in the multiplicative constant hidden behind the O(n2log n) space complexity measure for the algorithm is related to the number of non-terminals in the (unambiguous) grammar used. I say that a grammar is k-efficient if it allows the processors to store not more than k pointer pairs. I call a 1-efficient grammar an efficient grammar. I show that two properties that I call nt-disjunction and rhsdasjunction together with unambiguity are sufficient but not necessary conditions for grammar efficiency. I also show that unambiguity itself is not a necessary condition for efficiency. I then consider the case of ambiguous grammars. I present two methods for outputting multiple parses. Both output each parse in linear time. One method has O(n3log n) space complexity while the other has O(n2log n) space complexity. I then address the issue of problem decomposition. I show how part of my extension can be adapted, using a standard technique, to process inputs that would be too large for an array of some fixed size. I then discuss briefly some issues related to implementation. I report on an actual implementation on the I.C.L. DAP. Finally, I show how another systolic CFL parsing algorithm, by Chang, Ibarra and Palis [Chang 87], can be generalized to output parses in preorder and inorder

    Ein heuristisch gesteuerter Chart-Parser für attributierte Graph-Grammatiken

    Get PDF
    corecore