2,261 research outputs found
Robust Processing of Natural Language
Previous approaches to robustness in natural language processing usually
treat deviant input by relaxing grammatical constraints whenever a successful
analysis cannot be provided by ``normal'' means. This schema implies, that
error detection always comes prior to error handling, a behaviour which hardly
can compete with its human model, where many erroneous situations are treated
without even noticing them.
The paper analyses the necessary preconditions for achieving a higher degree
of robustness in natural language processing and suggests a quite different
approach based on a procedure for structural disambiguation. It not only offers
the possibility to cope with robustness issues in a more natural way but
eventually might be suited to accommodate quite different aspects of robust
behaviour within a single framework.Comment: 16 pages, LaTeX, uses pstricks.sty, pstricks.tex, pstricks.pro,
pst-node.sty, pst-node.tex, pst-node.pro. To appear in: Proc. KI-95, 19th
German Conference on Artificial Intelligence, Bielefeld (Germany), Lecture
Notes in Computer Science, Springer 199
MUSE CSP: An Extension to the Constraint Satisfaction Problem
This paper describes an extension to the constraint satisfaction problem
(CSP) called MUSE CSP (MUltiply SEgmented Constraint Satisfaction Problem).
This extension is especially useful for those problems which segment into
multiple sets of partially shared variables. Such problems arise naturally in
signal processing applications including computer vision, speech processing,
and handwriting recognition. For these applications, it is often difficult to
segment the data in only one way given the low-level information utilized by
the segmentation algorithms. MUSE CSP can be used to compactly represent
several similar instances of the constraint satisfaction problem. If multiple
instances of a CSP have some common variables which have the same domains and
constraints, then they can be combined into a single instance of a MUSE CSP,
reducing the work required to apply the constraints. We introduce the concepts
of MUSE node consistency, MUSE arc consistency, and MUSE path consistency. We
then demonstrate how MUSE CSP can be used to compactly represent lexically
ambiguous sentences and the multiple sentence hypotheses that are often
generated by speech recognition algorithms so that grammar constraints can be
used to provide parses for all syntactically correct sentences. Algorithms for
MUSE arc and path consistency are provided. Finally, we discuss how to create a
MUSE CSP from a set of CSPs which are labeled to indicate when the same
variable is shared by more than a single CSP.Comment: See http://www.jair.org/ for any accompanying file
A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena
Word reordering is one of the most difficult aspects of statistical machine
translation (SMT), and an important factor of its quality and efficiency.
Despite the vast amount of research published to date, the interest of the
community in this problem has not decreased, and no single method appears to be
strongly dominant across language pairs. Instead, the choice of the optimal
approach for a new translation task still seems to be mostly driven by
empirical trials. To orientate the reader in this vast and complex research
area, we present a comprehensive survey of word reordering viewed as a
statistical modeling challenge and as a natural language phenomenon. The survey
describes in detail how word reordering is modeled within different
string-based and tree-based SMT frameworks and as a stand-alone task, including
systematic overviews of the literature in advanced reordering modeling. We then
question why some approaches are more successful than others in different
language pairs. We argue that, besides measuring the amount of reordering, it
is important to understand which kinds of reordering occur in a given language
pair. To this end, we conduct a qualitative analysis of word reordering
phenomena in a diverse sample of language pairs, based on a large collection of
linguistic knowledge. Empirical results in the SMT literature are shown to
support the hypothesis that a few linguistic facts can be very useful to
anticipate the reordering characteristics of a language pair and to select the
SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic
Concurrent Lexicalized Dependency Parsing: The ParseTalk Model
A grammar model for concurrent, object-oriented natural language parsing is
introduced. Complete lexical distribution of grammatical knowledge is achieved
building upon the head-oriented notions of valency and dependency, while
inheritance mechanisms are used to capture lexical generalizations. The
underlying concurrent computation model relies upon the actor paradigm. We
consider message passing protocols for establishing dependency relations and
ambiguity handling.Comment: 90kB, 7pages Postscrip
Learning Language from a Large (Unannotated) Corpus
A novel approach to the fully automated, unsupervised extraction of
dependency grammars and associated syntax-to-semantic-relationship mappings
from large text corpora is described. The suggested approach builds on the
authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well
as on a number of prior papers and approaches from the statistical language
learning literature. If successful, this approach would enable the mining of
all the information needed to power a natural language comprehension and
generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa
UsingWomb Grammars for Inducing the Grammar of a Subset of Yorùbá Noun Phrases
We address the problem of inducing the grammar of an under-resourced language,Yorùbá, from the grammar of English using an efficient and, linguistically savvy, constraintsolving model of grammar induction –Womb Grammars (WG). Our proposed methodologyadapts WG for parsing a subset of noun phrases of the target language Yorùbá, from thegrammar of the source language English, which is described as properties between pairs ofconstituents. Our model is implemented in CHRG (Constraint Handling Rule Grammar) and,it has been used for inducing the grammar of a useful subset of Yorùbá Noun Phrases. Interestingextensions to the original Womb Grammar model are presented, motivated by the specificneeds of Yorùbá and, similar tone languages
- …