266 research outputs found
An Efficient Implementation of the Head-Corner Parser
This paper describes an efficient and robust implementation of a
bi-directional, head-driven parser for constraint-based grammars. This parser
is developed for the OVIS system: a Dutch spoken dialogue system in which
information about public transport can be obtained by telephone.
After a review of the motivation for head-driven parsing strategies, and
head-corner parsing in particular, a non-deterministic version of the
head-corner parser is presented. A memoization technique is applied to obtain a
fast parser. A goal-weakening technique is introduced which greatly improves
average case efficiency, both in terms of speed and space requirements.
I argue in favor of such a memoization strategy with goal-weakening in
comparison with ordinary chart-parsers because such a strategy can be applied
selectively and therefore enormously reduces the space requirements of the
parser, while no practical loss in time-efficiency is observed. On the
contrary, experiments are described in which head-corner and left-corner
parsers implemented with selective memoization and goal weakening outperform
`standard' chart parsers. The experiments include the grammar of the OVIS
system and the Alvey NL Tools grammar.
Head-corner parsing is a mix of bottom-up and top-down processing. Certain
approaches towards robust parsing require purely bottom-up processing.
Therefore, it seems that head-corner parsing is unsuitable for such robust
parsing techniques. However, it is shown how underspecification (which arises
very naturally in a logic programming environment) can be used in the
head-corner parser to allow such robust parsing techniques. A particular robust
parsing model is described which is implemented in OVIS.Comment: 31 pages, uses cl.st
Transducers from Rewrite Rules with Backreferences
Context sensitive rewrite rules have been widely used in several areas of
natural language processing, including syntax, morphology, phonology and speech
processing. Kaplan and Kay, Karttunen, and Mohri & Sproat have given various
algorithms to compile such rewrite rules into finite-state transducers. The
present paper extends this work by allowing a limited form of backreferencing
in such rules. The explicit use of backreferencing leads to more elegant and
general solutions.Comment: 8 pages, EACL 1999 Berge
Constraint-Based Categorial Grammar
We propose a generalization of Categorial Grammar in which lexical categories
are defined by means of recursive constraints. In particular, the introduction
of relational constraints allows one to capture the effects of (recursive)
lexical rules in a computationally attractive manner. We illustrate the
linguistic merits of the new approach by showing how it accounts for the syntax
of Dutch cross-serial dependencies and the position and scope of adjuncts in
such constructions. Delayed evaluation is used to process grammars containing
recursive constraints.Comment: 8 pages, LaTe
MoNoise: Modeling Noise Using a Modular Normalization System
We propose MoNoise: a normalization model focused on generalizability and
efficiency, it aims at being easily reusable and adaptable. Normalization is
the task of translating texts from a non- canonical domain to a more canonical
domain, in our case: from social media data to standard language. Our proposed
model is based on a modular candidate generation in which each module is
responsible for a different type of normalization action. The most important
generation modules are a spelling correction system and a word embeddings
module. Depending on the definition of the normalization task, a static lookup
list can be crucial for performance. We train a random forest classifier to
rank the candidates, which generalizes well to all different types of
normaliza- tion actions. Most features for the ranking originate from the
generation modules; besides these features, N-gram features prove to be an
important source of information. We show that MoNoise beats the
state-of-the-art on different normalization benchmarks for English and Dutch,
which all define the task of normalization slightly different.Comment: Source code: https://bitbucket.org/robvanderg/monois
Treatment of Epsilon-Moves in Subset Construction
The paper discusses the problem of determinising finite-state automata
containing large numbers of epsilon-moves. Experiments with finite-state
approximations of natural language grammars often give rise to very large
automata with a very large number of epsilon-moves. The paper identifies three
subset construction algorithms which treat epsilon-moves. A number of
experiments has been performed which indicate that the algorithms differ
considerably in practice. Furthermore, the experiments suggest that the average
number of epsilon-moves per state can be used to predict which algorithm is
likely to perform best for a given input automaton
Semantic Mapping for Lexical Sparseness Reduction in Parsing
Bilexical information is known to be helpful inparse disambiguation, but the benefit is limitedbecause of lexical sparseness. An approach us-ing word classes can reduce sparseness and po-tentially leads to more accurate parsing. Firstly,we describe a method identifying the depen-dency types of the Alpino parser for Dutchto which we would like to apply generaliza-tion. These are the types which are most likelyto reduce the sparseness and positively affectparsing at the same time. Secondly, we providepreliminary results for enhancement of depen-dency types with semantic classes derived froma WordNet-like inventory for Dutch. Classesof varying degrees of generality are appliedto three dependency types: nominal conjunc-tion, modification of adjective and modificationof noun. We observe improvements in someconcrete cases, whereas the overall parsing ac-curacy either remains unchanged or decreases.We identify drawbacks of human-built senseinventories, which provides motivation for adistributional semantic approach
- …