2,995 research outputs found

    An Efficient Implementation of the Head-Corner Parser

    Get PDF
    This paper describes an efficient and robust implementation of a bi-directional, head-driven parser for constraint-based grammars. This parser is developed for the OVIS system: a Dutch spoken dialogue system in which information about public transport can be obtained by telephone. After a review of the motivation for head-driven parsing strategies, and head-corner parsing in particular, a non-deterministic version of the head-corner parser is presented. A memoization technique is applied to obtain a fast parser. A goal-weakening technique is introduced which greatly improves average case efficiency, both in terms of speed and space requirements. I argue in favor of such a memoization strategy with goal-weakening in comparison with ordinary chart-parsers because such a strategy can be applied selectively and therefore enormously reduces the space requirements of the parser, while no practical loss in time-efficiency is observed. On the contrary, experiments are described in which head-corner and left-corner parsers implemented with selective memoization and goal weakening outperform `standard' chart parsers. The experiments include the grammar of the OVIS system and the Alvey NL Tools grammar. Head-corner parsing is a mix of bottom-up and top-down processing. Certain approaches towards robust parsing require purely bottom-up processing. Therefore, it seems that head-corner parsing is unsuitable for such robust parsing techniques. However, it is shown how underspecification (which arises very naturally in a logic programming environment) can be used in the head-corner parser to allow such robust parsing techniques. A particular robust parsing model is described which is implemented in OVIS.Comment: 31 pages, uses cl.st

    Dependency parsing of Turkish

    Get PDF
    The suitability of different parsing methods for different languages is an important topic in syntactic parsing. Especially lesser-studied languages, typologically different from the languages for which methods have originally been developed, poses interesting challenges in this respect. This article presents an investigation of data-driven dependency parsing of Turkish, an agglutinative free constituent order language that can be seen as the representative of a wider class of languages of similar type. Our investigations show that morphological structure plays an essential role in finding syntactic relations in such a language. In particular, we show that employing sublexical representations called inflectional groups, rather than word forms, as the basic parsing units improves parsing accuracy. We compare two different parsing methods, one based on a probabilistic model with beam search, the other based on discriminative classifiers and a deterministic parsing strategy, and show that the usefulness of sublexical units holds regardless of parsing method.We examine the impact of morphological and lexical information in detail and show that, properly used, this kind of information can improve parsing accuracy substantially. Applying the techniques presented in this article, we achieve the highest reported accuracy for parsing the Turkish Treebank

    An Efficient Probabilistic Context-Free Parsing Algorithm that Computes Prefix Probabilities

    Full text link
    We describe an extension of Earley's parser for stochastic context-free grammars that computes the following quantities given a stochastic context-free grammar and an input string: a) probabilities of successive prefixes being generated by the grammar; b) probabilities of substrings being generated by the nonterminals, including the entire string being generated by the grammar; c) most likely (Viterbi) parse of the string; d) posterior expected number of applications of each grammar production, as required for reestimating rule probabilities. (a) and (b) are computed incrementally in a single left-to-right pass over the input. Our algorithm compares favorably to standard bottom-up parsing methods for SCFGs in that it works efficiently on sparse grammars by making use of Earley's top-down control structure. It can process any context-free rule format without conversion to some normal form, and combines computations for (a) through (d) in a single algorithm. Finally, the algorithm has simple extensions for processing partially bracketed inputs, and for finding partial parses and their likelihoods on ungrammatical inputs.Comment: 45 pages. Slightly shortened version to appear in Computational Linguistics 2

    Generalizing input-driven languages: theoretical and practical benefits

    Get PDF
    Regular languages (RL) are the simplest family in Chomsky's hierarchy. Thanks to their simplicity they enjoy various nice algebraic and logic properties that have been successfully exploited in many application fields. Practically all of their related problems are decidable, so that they support automatic verification algorithms. Also, they can be recognized in real-time. Context-free languages (CFL) are another major family well-suited to formalize programming, natural, and many other classes of languages; their increased generative power w.r.t. RL, however, causes the loss of several closure properties and of the decidability of important problems; furthermore they need complex parsing algorithms. Thus, various subclasses thereof have been defined with different goals, spanning from efficient, deterministic parsing to closure properties, logic characterization and automatic verification techniques. Among CFL subclasses, so-called structured ones, i.e., those where the typical tree-structure is visible in the sentences, exhibit many of the algebraic and logic properties of RL, whereas deterministic CFL have been thoroughly exploited in compiler construction and other application fields. After surveying and comparing the main properties of those various language families, we go back to operator precedence languages (OPL), an old family through which R. Floyd pioneered deterministic parsing, and we show that they offer unexpected properties in two fields so far investigated in totally independent ways: they enable parsing parallelization in a more effective way than traditional sequential parsers, and exhibit the same algebraic and logic properties so far obtained only for less expressive language families

    Z2SAL: a translation-based model checker for Z

    No full text
    Despite being widely known and accepted in industry, the Z formal specification language has not so far been well supported by automated verification tools, mostly because of the challenges in handling the abstraction of the language. In this paper we discuss a novel approach to building a model-checker for Z, which involves implementing a translation from Z into SAL, the input language for the Symbolic Analysis Laboratory, a toolset which includes a number of model-checkers and a simulator. The Z2SAL translation deals with a number of important issues, including: mapping unbounded, abstract specifications into bounded, finite models amenable to a BDD-based symbolic checker; converting a non-constructive and piecemeal style of functional specification into a deterministic, automaton-based style of specification; and supporting the rich set-based vocabulary of the Z mathematical toolkit. This paper discusses progress made towards implementing as complete and faithful a translation as possible, while highlighting certain assumptions, respecting certain limitations and making use of available optimisations. The translation is illustrated throughout with examples; and a complete working example is presented, together with performance data

    A Neural Network Architecture for Syntax Analysis

    Get PDF
    Artificial neural networks (ANNs), due to their inherent parallelism and potential fault tolerance, offer an attractive paradigm for robust and efficient implementations of syntax analyzers. This paper proposes a modular neural network architecture for syntax analysis on continuous input stream of characters. The components of the proposed architecture include neural network designs for a stack, a lexical analyzer, a grammar parser and a parse tree construction module. The proposed NN stack allows simulation of a stack of large depth, needs no training, and hence is not application-specific. The proposed NN lexical analyzer provides a relatively efficient and high performance alternative to current computer systems for lexical analysis especially in natural language processing applications. The proposed NN parser generates parse trees by parsing strings from widely used subsets of deterministic context-free languages (generated by LR grammars). The estimated performance of the proposed neural network architecture (based on current CMOS VLSI technology) for syntax analysis is compared with that of commonly used approaches to syntax analysis in current computer systems. The results of this performance comparison suggest that the proposed neural network architecture offers an attractive approach for syntax analysis in a wide range of practical applications such as programming language compilation and natural language processing

    The CoNLL 2007 shared task on dependency parsing

    Get PDF
    The Conference on Computational Natural Language Learning features a shared task, in which participants train and test their learning systems on the same data sets. In 2007, as in 2006, the shared task has been devoted to dependency parsing, this year with both a multilingual track and a domain adaptation track. In this paper, we define the tasks of the different tracks and describe how the data sets were created from existing treebanks for ten languages. In addition, we characterize the different approaches of the participating systems, report the test results, and provide a first analysis of these results

    A Robust and Efficient Parser for Non-Canonical Inputs

    No full text
    International audienceWe present in this paper a parser relying on a constraint-based formalism called Property Grammar. We show how constraints constitute an efficient solution in parsing non canonical material such as spoken language transcription or e-mails. This technique, provided that it is implemented with some control mechanisms, is very efficient. Some results are presented, from the French parsing evaluation campaign EASy
    corecore