274 research outputs found

    An Efficient Implementation of the Head-Corner Parser

    Get PDF
    This paper describes an efficient and robust implementation of a bi-directional, head-driven parser for constraint-based grammars. This parser is developed for the OVIS system: a Dutch spoken dialogue system in which information about public transport can be obtained by telephone. After a review of the motivation for head-driven parsing strategies, and head-corner parsing in particular, a non-deterministic version of the head-corner parser is presented. A memoization technique is applied to obtain a fast parser. A goal-weakening technique is introduced which greatly improves average case efficiency, both in terms of speed and space requirements. I argue in favor of such a memoization strategy with goal-weakening in comparison with ordinary chart-parsers because such a strategy can be applied selectively and therefore enormously reduces the space requirements of the parser, while no practical loss in time-efficiency is observed. On the contrary, experiments are described in which head-corner and left-corner parsers implemented with selective memoization and goal weakening outperform `standard' chart parsers. The experiments include the grammar of the OVIS system and the Alvey NL Tools grammar. Head-corner parsing is a mix of bottom-up and top-down processing. Certain approaches towards robust parsing require purely bottom-up processing. Therefore, it seems that head-corner parsing is unsuitable for such robust parsing techniques. However, it is shown how underspecification (which arises very naturally in a logic programming environment) can be used in the head-corner parser to allow such robust parsing techniques. A particular robust parsing model is described which is implemented in OVIS.Comment: 31 pages, uses cl.st

    Transducers from Rewrite Rules with Backreferences

    Full text link
    Context sensitive rewrite rules have been widely used in several areas of natural language processing, including syntax, morphology, phonology and speech processing. Kaplan and Kay, Karttunen, and Mohri & Sproat have given various algorithms to compile such rewrite rules into finite-state transducers. The present paper extends this work by allowing a limited form of backreferencing in such rules. The explicit use of backreferencing leads to more elegant and general solutions.Comment: 8 pages, EACL 1999 Berge

    Constraint-Based Categorial Grammar

    Full text link
    We propose a generalization of Categorial Grammar in which lexical categories are defined by means of recursive constraints. In particular, the introduction of relational constraints allows one to capture the effects of (recursive) lexical rules in a computationally attractive manner. We illustrate the linguistic merits of the new approach by showing how it accounts for the syntax of Dutch cross-serial dependencies and the position and scope of adjuncts in such constructions. Delayed evaluation is used to process grammars containing recursive constraints.Comment: 8 pages, LaTe

    MoNoise: Modeling Noise Using a Modular Normalization System

    Get PDF
    We propose MoNoise: a normalization model focused on generalizability and efficiency, it aims at being easily reusable and adaptable. Normalization is the task of translating texts from a non- canonical domain to a more canonical domain, in our case: from social media data to standard language. Our proposed model is based on a modular candidate generation in which each module is responsible for a different type of normalization action. The most important generation modules are a spelling correction system and a word embeddings module. Depending on the definition of the normalization task, a static lookup list can be crucial for performance. We train a random forest classifier to rank the candidates, which generalizes well to all different types of normaliza- tion actions. Most features for the ranking originate from the generation modules; besides these features, N-gram features prove to be an important source of information. We show that MoNoise beats the state-of-the-art on different normalization benchmarks for English and Dutch, which all define the task of normalization slightly different.Comment: Source code: https://bitbucket.org/robvanderg/monois

    Treatment of Epsilon-Moves in Subset Construction

    Full text link
    The paper discusses the problem of determinising finite-state automata containing large numbers of epsilon-moves. Experiments with finite-state approximations of natural language grammars often give rise to very large automata with a very large number of epsilon-moves. The paper identifies three subset construction algorithms which treat epsilon-moves. A number of experiments has been performed which indicate that the algorithms differ considerably in practice. Furthermore, the experiments suggest that the average number of epsilon-moves per state can be used to predict which algorithm is likely to perform best for a given input automaton

    AlpinoGraph:A Graph-based Search Engine for Flexible and Efficient Treebank Search

    Get PDF

    Semantic Mapping for Lexical Sparseness Reduction in Parsing

    Get PDF
    Bilexical information is known to be helpful inparse disambiguation, but the benefit is limitedbecause of lexical sparseness. An approach us-ing word classes can reduce sparseness and po-tentially leads to more accurate parsing. Firstly,we describe a method identifying the depen-dency types of the Alpino parser for Dutchto which we would like to apply generaliza-tion. These are the types which are most likelyto reduce the sparseness and positively affectparsing at the same time. Secondly, we providepreliminary results for enhancement of depen-dency types with semantic classes derived froma WordNet-like inventory for Dutch. Classesof varying degrees of generality are appliedto three dependency types: nominal conjunc-tion, modification of adjective and modificationof noun. We observe improvements in someconcrete cases, whereas the overall parsing ac-curacy either remains unchanged or decreases.We identify drawbacks of human-built senseinventories, which provides motivation for adistributional semantic approach
    • …
    corecore