2,261 research outputs found

    Robust Processing of Natural Language

    Full text link
    Previous approaches to robustness in natural language processing usually treat deviant input by relaxing grammatical constraints whenever a successful analysis cannot be provided by ``normal'' means. This schema implies, that error detection always comes prior to error handling, a behaviour which hardly can compete with its human model, where many erroneous situations are treated without even noticing them. The paper analyses the necessary preconditions for achieving a higher degree of robustness in natural language processing and suggests a quite different approach based on a procedure for structural disambiguation. It not only offers the possibility to cope with robustness issues in a more natural way but eventually might be suited to accommodate quite different aspects of robust behaviour within a single framework.Comment: 16 pages, LaTeX, uses pstricks.sty, pstricks.tex, pstricks.pro, pst-node.sty, pst-node.tex, pst-node.pro. To appear in: Proc. KI-95, 19th German Conference on Artificial Intelligence, Bielefeld (Germany), Lecture Notes in Computer Science, Springer 199

    MUSE CSP: An Extension to the Constraint Satisfaction Problem

    Full text link
    This paper describes an extension to the constraint satisfaction problem (CSP) called MUSE CSP (MUltiply SEgmented Constraint Satisfaction Problem). This extension is especially useful for those problems which segment into multiple sets of partially shared variables. Such problems arise naturally in signal processing applications including computer vision, speech processing, and handwriting recognition. For these applications, it is often difficult to segment the data in only one way given the low-level information utilized by the segmentation algorithms. MUSE CSP can be used to compactly represent several similar instances of the constraint satisfaction problem. If multiple instances of a CSP have some common variables which have the same domains and constraints, then they can be combined into a single instance of a MUSE CSP, reducing the work required to apply the constraints. We introduce the concepts of MUSE node consistency, MUSE arc consistency, and MUSE path consistency. We then demonstrate how MUSE CSP can be used to compactly represent lexically ambiguous sentences and the multiple sentence hypotheses that are often generated by speech recognition algorithms so that grammar constraints can be used to provide parses for all syntactically correct sentences. Algorithms for MUSE arc and path consistency are provided. Finally, we discuss how to create a MUSE CSP from a set of CSPs which are labeled to indicate when the same variable is shared by more than a single CSP.Comment: See http://www.jair.org/ for any accompanying file

    A Survey of Word Reordering in Statistical Machine Translation: Computational Models and Language Phenomena

    Get PDF
    Word reordering is one of the most difficult aspects of statistical machine translation (SMT), and an important factor of its quality and efficiency. Despite the vast amount of research published to date, the interest of the community in this problem has not decreased, and no single method appears to be strongly dominant across language pairs. Instead, the choice of the optimal approach for a new translation task still seems to be mostly driven by empirical trials. To orientate the reader in this vast and complex research area, we present a comprehensive survey of word reordering viewed as a statistical modeling challenge and as a natural language phenomenon. The survey describes in detail how word reordering is modeled within different string-based and tree-based SMT frameworks and as a stand-alone task, including systematic overviews of the literature in advanced reordering modeling. We then question why some approaches are more successful than others in different language pairs. We argue that, besides measuring the amount of reordering, it is important to understand which kinds of reordering occur in a given language pair. To this end, we conduct a qualitative analysis of word reordering phenomena in a diverse sample of language pairs, based on a large collection of linguistic knowledge. Empirical results in the SMT literature are shown to support the hypothesis that a few linguistic facts can be very useful to anticipate the reordering characteristics of a language pair and to select the SMT framework that best suits them.Comment: 44 pages, to appear in Computational Linguistic

    Concurrent Lexicalized Dependency Parsing: The ParseTalk Model

    Full text link
    A grammar model for concurrent, object-oriented natural language parsing is introduced. Complete lexical distribution of grammatical knowledge is achieved building upon the head-oriented notions of valency and dependency, while inheritance mechanisms are used to capture lexical generalizations. The underlying concurrent computation model relies upon the actor paradigm. We consider message passing protocols for establishing dependency relations and ambiguity handling.Comment: 90kB, 7pages Postscrip

    Learning Language from a Large (Unannotated) Corpus

    Full text link
    A novel approach to the fully automated, unsupervised extraction of dependency grammars and associated syntax-to-semantic-relationship mappings from large text corpora is described. The suggested approach builds on the authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well as on a number of prior papers and approaches from the statistical language learning literature. If successful, this approach would enable the mining of all the information needed to power a natural language comprehension and generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa

    UsingWomb Grammars for Inducing the Grammar of a Subset of Yorùbá Noun Phrases

    Get PDF
    We address the problem of inducing the grammar of an under-resourced language,Yorùbá, from the grammar of English using an efficient and, linguistically savvy, constraintsolving model of grammar induction –Womb Grammars (WG). Our proposed methodologyadapts WG for parsing a subset of noun phrases of the target language Yorùbá, from thegrammar of the source language English, which is described as properties between pairs ofconstituents. Our model is implemented in CHRG (Constraint Handling Rule Grammar) and,it has been used for inducing the grammar of a useful subset of Yorùbá Noun Phrases. Interestingextensions to the original Womb Grammar model are presented, motivated by the specificneeds of Yorùbá and, similar tone languages
    corecore