2,610 research outputs found

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Japanese/English Cross-Language Information Retrieval: Exploration of Query Translation and Transliteration

    Full text link
    Cross-language information retrieval (CLIR), where queries and documents are in different languages, has of late become one of the major topics within the information retrieval community. This paper proposes a Japanese/English CLIR system, where we combine a query translation and retrieval modules. We currently target the retrieval of technical documents, and therefore the performance of our system is highly dependent on the quality of the translation of technical terms. However, the technical term translation is still problematic in that technical terms are often compound words, and thus new terms are progressively created by combining existing base words. In addition, Japanese often represents loanwords based on its special phonogram. Consequently, existing dictionaries find it difficult to achieve sufficient coverage. To counter the first problem, we produce a Japanese/English dictionary for base words, and translate compound words on a word-by-word basis. We also use a probabilistic method to resolve translation ambiguity. For the second problem, we use a transliteration method, which corresponds words unlisted in the base word dictionary to their phonetic equivalents in the target language. We evaluate our system using a test collection for CLIR, and show that both the compound word translation and transliteration methods improve the system performance

    Navigating the Phonology-Syntax Interface and Tri-P Mapping

    Get PDF
    While it is widely acknowledged that phonological processes may be restricted to certain domains, appearing in a particular location or spanning some - but not all - junctures within (morpho-)syntactic structure, debate centers on how to derive phonological domains. There are three main models in the current literature: Relational Mapping, Syntax-Driven Mapping, and Syntactic-Spell Out. Comparisons between specific approaches have been made, but the only side-by-side test of all three approaches using the same data is found in Miller 2018. As part of that study, extreme morpho-syntactic complexity or "polysynthesis" is argued to be the crucial test for any interface model. A side-by-side test using data from Kiowa and Saulteaux Ojibwe shows that no current model is entirely successful. Building on those results, this paper introduces the foundations for Tri-P Mapping, a new model of the phonology-syntax interface

    Derivational Trapping And The Morphosyntax Of Inflectionlessness

    Get PDF
    The broad objective of this dissertation is to advance our understanding of how grammatical operations are formulated in the postsyntactic module of the grammar. To that end, the dissertation examines the distribution of agreement morphemes, and especially the distribution of exceptionally inflectionless elements, whose lack of agreement morphology can affect other operations such as postsyntactic movement, in some cases interfering with these operations, yielding ungrammaticality. The dissertation pursues a serial rule-based approach within the Distributed Morphology (DM) framework (Halle and Marantz 1993; Embick and Noyer 2001, 2007; Arregi and Nevins 2012; Harley 2014; a.o.), focusing chiefly on postsyntactic operations that produce and refer to agreement morphology (‘node-sprouting’) and postsyntactic operations that displace heads onto neighboring elements. The key innovation of the current model is that postsyntactic operations distinguish between their triggering environments and the actual execution of a change. A theoretical consequence of making this distinction is that a derivation can crash when the conditions for application of an operation are satisfied but the change itself cannot be executed, yielding ungrammaticality. This state of affairs is referred to as derivational trapping. The evidence that bears on the theory of how postsyntactic rules are formulated comes from exceptionally inflectionless (EI) elements in various languages, including Bulgarian, Bosnian/Croatian/Serbian (BCS), German, Greek, Latin, Icelandic, Italian, and Russian. These EI elements belong to some syntactic category – such as adjective – whose members are specified to bear agreement morphology, while EI elements lack this morphology. The distributional properties of these elements is important for our understanding not only of the representation of inflectionlessness, but also of postsyntactic movement, the separation between the narrow syntax and the postsyntactic module, and the ways in which crashes in the postsyntactic module arise. Beyond the evidence from inflectionlessness for derivational trapping, the dissertation also examines other phenomena that motivate this approach, including lexical gaps, coordination, and other forms of postsyntactic movement. Chapter 1 defines derivational trapping and articulates a model of the postsyntax, with special attention paid to two types of postsyntactic operations: i) node-sprouting, the operation which produces dissociated morphology such as agreement morphemes, and ii) postsyntactic movement. This chapter motivates an account of node-sprouting in which the operation may target a terminal node, a morphological word (MWd) (in the sense of Embick and Noyer 2001), or a phrase, and argues that node-sprouting at the MWd occurs prior to linearly defined movement operations. It also synthesizes various case studies from the literature to motivate an account of postsyntactic movement, whose locality is argued to be restricted by adjacency, in a way defined by the stage of linearization at which the operation is specified. In Chapter 2, I claim that exceptional inflectionlessness is (often) a morphological fact that is encoded postsyntactically. Consequently, given the modularity of narrow syntax and the postsyntactic module, it is predicted that inflectionlessness can affect postsyntactic processes but not the narrow syntax. I evaluate this hypothesis by examining how the absence of agreement morphology affects postsyntactic movement and other operations in Latin, Icelandic, Bulgarian, Bosnian/Croatian/Serbian (BCS), Italian, and Russian. For Bulgarian and BCS, I offer a derivational trapping account to capture patterns of ungrammaticality. Chapter 3 investigates German adjectival inflection, and demonstrates that its distribution is best stated in linear terms, thereby supporting its postsyntactic status, and it also demonstrates that the distribution of inflection supports the hypothesis that node-sprouting can happen at the phrasal level. I also demonstrate how exceptional inflectionlessness among adjectives is sensitive to linear order, and offer a derivational trapping account of the inability for such adjectives to be stranded by noun phrase ellipsis. Chapter 4, extends the account of derivational trapping to three other phenomena beyond agreement morphology: lexical gaps, postsyntactic movement into coordinate structures, and the (postsyntactic) formation of English possessive pronouns. I connect the stride gap (Yang 2016) to the feature structure and morphophonology of participles and preterites, showing how lexical gaps can give rise to derivational trapping due to the structure of morphophonological rules. I also argue on the basis of coordination data from various Romance languages for a derivational trapping account of postsyntactic ATB violations, with a refinement of the ATB constraint that permits certain types of attested putative violations. Lastly, I argue that derivational trapping can occur in the production of English possessive pronouns; the account captures surprising patterns of ungrammaticality that arise when an internally complex possessor contains a pronoun. Chapter 5 summarizes the findings of the dissertation, pointing to limitations of the current study as well as to directions for future work

    Lexical and Grammar Resource Engineering for Runyankore & Rukiga: A Symbolic Approach

    Get PDF
    Current research in computational linguistics and natural language processing (NLP) requires the existence of language resources. Whereas these resources are available for a few well-resourced languages, there are many languages that have been neglected. Among the neglected and / or under-resourced languages are Runyankore and Rukiga (henceforth referred to as Ry/Rk). Recently, the NLP community has started to acknowledge that resources for under-resourced languages should also be given priority. Why? One reason being that as far as language typology is concerned, the few well-resourced languages do not represent the structural diversity of the remaining languages. The central focus of this thesis is about enabling the computational analysis and generation of utterances in Ry/Rk. Ry/Rk are two closely related languages spoken by about 3.4 and 2.4 million people respectively. They belong to the Nyoro-Ganda (JE10) language zone of the Great Lakes, Narrow Bantu of the Niger-Congo language family.The computational processing of these languages is achieved by formalising the grammars of these two languages using Grammatical Framework (GF) and its Resource Grammar Library (RGL). In addition to the grammar, a general-purpose computational lexicon for the two languages is developed. Although we utilise the lexicon to tremendously increase the lexical coverage of the grammars, the lexicon can be used for other NLP tasks.In this thesis a symbolic / rule-based approach is taken because the lack of adequate languages resources makes the use of data-driven NLP approaches unsuitable for these languages
    • …
    corecore