Search CORE

266 research outputs found

An Efficient Implementation of the Head-Corner Parser

Author: van Noord Gertjan
Publication venue
Publication date: 01/01/1996
Field of study

This paper describes an efficient and robust implementation of a bi-directional, head-driven parser for constraint-based grammars. This parser is developed for the OVIS system: a Dutch spoken dialogue system in which information about public transport can be obtained by telephone. After a review of the motivation for head-driven parsing strategies, and head-corner parsing in particular, a non-deterministic version of the head-corner parser is presented. A memoization technique is applied to obtain a fast parser. A goal-weakening technique is introduced which greatly improves average case efficiency, both in terms of speed and space requirements. I argue in favor of such a memoization strategy with goal-weakening in comparison with ordinary chart-parsers because such a strategy can be applied selectively and therefore enormously reduces the space requirements of the parser, while no practical loss in time-efficiency is observed. On the contrary, experiments are described in which head-corner and left-corner parsers implemented with selective memoization and goal weakening outperform `standard' chart parsers. The experiments include the grammar of the OVIS system and the Alvey NL Tools grammar. Head-corner parsing is a mix of bottom-up and top-down processing. Certain approaches towards robust parsing require purely bottom-up processing. Therefore, it seems that head-corner parsing is unsuitable for such robust parsing techniques. However, it is shown how underspecification (which arises very naturally in a logic programming environment) can be used in the head-corner parser to allow such robust parsing techniques. A particular robust parsing model is described which is implemented in OVIS.Comment: 31 pages, uses cl.st

arXiv.org e-Print Archive

CiteSeerX

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Transducers from Rewrite Rules with Backreferences

Author: Gerdemann Dale
van Noord Gertjan
Publication venue
Publication date: 01/01/1999
Field of study

Context sensitive rewrite rules have been widely used in several areas of natural language processing, including syntax, morphology, phonology and speech processing. Kaplan and Kay, Karttunen, and Mohri & Sproat have given various algorithms to compile such rewrite rules into finite-state transducers. The present paper extends this work by allowing a limited form of backreferencing in such rules. The explicit use of backreferencing leads to more elegant and general solutions.Comment: 8 pages, EACL 1999 Berge

arXiv.org e-Print Archive

CiteSeerX

Constraint-Based Categorial Grammar

Author: Bouma Gosse
van Noord Gertjan
Publication venue
Publication date: 01/01/1994
Field of study

We propose a generalization of Categorial Grammar in which lexical categories are defined by means of recursive constraints. In particular, the introduction of relational constraints allows one to capture the effects of (recursive) lexical rules in a computationally attractive manner. We illustrate the linguistic merits of the new approach by showing how it accounts for the syntax of Dutch cross-serial dependencies and the position and scope of adjuncts in such constructions. Delayed evaluation is used to process grammars containing recursive constraints.Comment: 8 pages, LaTe

arXiv.org e-Print Archive

CiteSeerX

MoNoise: Modeling Noise Using a Modular Normalization System

Author: van der Goot Rob
van Noord Gertjan
Publication venue
Publication date: 01/01/2017
Field of study

We propose MoNoise: a normalization model focused on generalizability and efficiency, it aims at being easily reusable and adaptable. Normalization is the task of translating texts from a non- canonical domain to a more canonical domain, in our case: from social media data to standard language. Our proposed model is based on a modular candidate generation in which each module is responsible for a different type of normalization action. The most important generation modules are a spelling correction system and a word embeddings module. Depending on the definition of the normalization task, a static lookup list can be crucial for performance. We train a random forest classifier to rank the candidates, which generalizes well to all different types of normaliza- tion actions. Most features for the ranking originate from the generation modules; besides these features, N-gram features prove to be an important source of information. We show that MoNoise beats the state-of-the-art on different normalization benchmarks for English and Dutch, which all define the task of normalization slightly different.Comment: Source code: https://bitbucket.org/robvanderg/monois

arXiv.org e-Print Archive

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen

Treatment of Epsilon-Moves in Subset Construction

Author: van Noord Gertjan
Publication venue
Publication date: 01/01/1998
Field of study

The paper discusses the problem of determinising finite-state automata containing large numbers of epsilon-moves. Experiments with finite-state approximations of natural language grammars often give rise to very large automata with a very large number of epsilon-moves. The paper identifies three subset construction algorithms which treat epsilon-moves. A number of experiments has been performed which indicate that the algorithms differ considerably in practice. Furthermore, the experiments suggest that the average number of epsilon-moves per state can be used to predict which algorithm is likely to perform best for a given input automaton

arXiv.org e-Print Archive

CiteSeerX

AlpinoGraph:A Graph-based Search Engine for Flexible and Efficient Treebank Search

Author: Kleiweg Peter
van Noord Gertjan
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Crossref

University of Groningen

Semantic Mapping for Lexical Sparseness Reduction in Parsing

Author: Suster Simon
van Noord Gertjan
Publication venue
Publication date: 01/01/2013
Field of study

Bilexical information is known to be helpful inparse disambiguation, but the benefit is limitedbecause of lexical sparseness. An approach us-ing word classes can reduce sparseness and po-tentially leads to more accurate parsing. Firstly,we describe a method identifying the depen-dency types of the Alpino parser for Dutchto which we would like to apply generaliza-tion. These are the types which are most likelyto reduce the sparseness and positively affectparsing at the same time. Secondly, we providepreliminary results for enhancement of depen-dency types with semantic classes derived froma WordNet-like inventory for Dutch. Classesof varying degrees of generality are appliedto three dependency types: nominal conjunc-tion, modification of adjective and modificationof noun. We observe improvements in someconcrete cases, whereas the overall parsing ac-curacy either remains unchanged or decreases.We identify drawbacks of human-built senseinventories, which provides motivation for adistributional semantic approach

ARTS repository - University of Groningen

A Taxonomy for In-depth Evaluation of Normalization for User Generated Content

Author: van der Goot Rob
van Noord Gertjan
van Noord Rik
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2018
Field of study

ARTS repository - University of Groningen