254 research outputs found

    Introduction to the special issue on cross-language algorithms and applications

    Get PDF
    With the increasingly global nature of our everyday interactions, the need for multilingual technologies to support efficient and efective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross-language in order to create multilingual technologies rapidly. The goal of this JAIR special issue on Cross-Language Algorithms and Applications (CLAA) is to present leading research in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.Postprint (published version

    A Multilingual Phonological Resource Toolkit for Ubiquitous Speech Technology

    Get PDF
    This paper outlines the generation process of a specifi computational linguistic representation termed the Multilingual Time Map, conceptually a multi-tape finit state transducer encoding linguistic data at different levels of granularity. The fi st component acquires phonological data from syllable labeled speech data, the second component define feature profiles the third component generates feature hierarchies and augments the acquired data with the define feature profiles and the fourth component displays the Multilingual Time Map as a graph

    Modeling Language Variation and Universals: A Survey on Typological Linguistics for Natural Language Processing

    Get PDF
    Linguistic typology aims to capture structural and semantic variation across the world's languages. A large-scale typology could provide excellent guidance for multilingual Natural Language Processing (NLP), particularly for languages that suffer from the lack of human labeled resources. We present an extensive literature survey on the use of typological information in the development of NLP techniques. Our survey demonstrates that to date, the use of information in existing typological databases has resulted in consistent but modest improvements in system performance. We show that this is due to both intrinsic limitations of databases (in terms of coverage and feature granularity) and under-employment of the typological features included in them. We advocate for a new approach that adapts the broad and discrete nature of typological categories to the contextual and continuous nature of machine learning algorithms used in contemporary NLP. In particular, we suggest that such approach could be facilitated by recent developments in data-driven induction of typological knowledge

    Handbook of Lexical Functional Grammar

    Get PDF
    Lexical Functional Grammar (LFG) is a nontransformational theory of linguistic structure, first developed in the 1970s by Joan Bresnan and Ronald M. Kaplan, which assumes that language is best described and modeled by parallel structures representing different facets of linguistic organization and information, related by means of functional correspondences. This volume has five parts. Part I, Overview and Introduction, provides an introduction to core syntactic concepts and representations. Part II, Grammatical Phenomena, reviews LFG work on a range of grammatical phenomena or constructions. Part III, Grammatical modules and interfaces, provides an overview of LFG work on semantics, argument structure, prosody, information structure, and morphology. Part IV, Linguistic disciplines, reviews LFG work in the disciplines of historical linguistics, learnability, psycholinguistics, and second language learning. Part V, Formal and computational issues and applications, provides an overview of computational and formal properties of the theory, implementations, and computational work on parsing, translation, grammar induction, and treebanks. Part VI, Language families and regions, reviews LFG work on languages spoken in particular geographical areas or in particular language families. The final section, Comparing LFG with other linguistic theories, discusses LFG work in relation to other theoretical approaches

    Lexical and Grammar Resource Engineering for Runyankore & Rukiga: A Symbolic Approach

    Get PDF
    Current research in computational linguistics and natural language processing (NLP) requires the existence of language resources. Whereas these resources are available for a few well-resourced languages, there are many languages that have been neglected. Among the neglected and / or under-resourced languages are Runyankore and Rukiga (henceforth referred to as Ry/Rk). Recently, the NLP community has started to acknowledge that resources for under-resourced languages should also be given priority. Why? One reason being that as far as language typology is concerned, the few well-resourced languages do not represent the structural diversity of the remaining languages. The central focus of this thesis is about enabling the computational analysis and generation of utterances in Ry/Rk. Ry/Rk are two closely related languages spoken by about 3.4 and 2.4 million people respectively. They belong to the Nyoro-Ganda (JE10) language zone of the Great Lakes, Narrow Bantu of the Niger-Congo language family.The computational processing of these languages is achieved by formalising the grammars of these two languages using Grammatical Framework (GF) and its Resource Grammar Library (RGL). In addition to the grammar, a general-purpose computational lexicon for the two languages is developed. Although we utilise the lexicon to tremendously increase the lexical coverage of the grammars, the lexicon can be used for other NLP tasks.In this thesis a symbolic / rule-based approach is taken because the lack of adequate languages resources makes the use of data-driven NLP approaches unsuitable for these languages
    • …
    corecore