17 research outputs found

    Pluralizing Nouns across Agglutinating Bantu Languages

    Get PDF
    Text generation may require the pluralization of nouns, such as in context-sensitive user interfaces and in natural language generation more broadly. While this has been solved for the widely used languages, this is still a challenge for the languages in the Bantu language family. Pluralization results obtained for isiZulu and Runyankore showed there were similarities in approach, including the need to combine morphology with syntax and semantics, despite belonging to different language zones. This suggests that bootstrapping and generalizability might be feasible. We investigated this systematically for seven languages across three different Guthrie language zones. The first outcome is that Meinhof’s 1948 specification of the noun classes are indeed inadequate for computational purposes for all examined languages, due to non-determinism in prefixes, and we thus redefined the characteristic noun class tables of 29 noun classes into 53. The second main result is that the generic pluralizer achieved over 93% accuracy in coverage testing and over 94% on a random sample. This is comparable to the language-specific isiZulu and Runyankore pluralizers

    Ontology verbalization in agglutinating Bantu languages: a study of Runyankore and its generalizability

    Get PDF
    Natural Language Generation (NLG) systems have been developed to generate text in multiple domains, including personalized patient information. However, their application is limited in Africa because they generate text in English, yet indigenous languages are still predominantly spoken throughout the continent, especially in rural areas. The existing healthcare NLG systems cannot be reused for Bantu languages due to the complex grammatical structure, nor can the generated text be used in machine translation systems for Bantu languages because they are computationally under-resourced. This research aimed to verbalize ontologies in agglutinating Bantu languages. We had four research objectives: (1) noun pluralization and verb conjugation in Runyankore; (2) Runyankore verbalization patterns for the selected description logic constructors; (3) combining the pluralization, conjugation, and verbalization components to form a Runyankore grammar engine; and (4) generalizing the Runyankore and isiZulu approaches to ontology verbalization to other agglutinating Bantu languages. We used an approach that combines morphology with syntax and semantics to develop a noun pluralizer for Runyankore, and used Context-Free Grammars (CFGs) for verb conjugation. We developed verbalization algorithms for eight constructors in a description logic. We then combined these components into a grammar engine developed as a Protégé5X plugin. The investigation into generalizability used the bootstrap approach, and investigated bootstrapping for languages in the same language zone (intra-zone bootstrappability) and languages across language zones (inter-zone bootstrappability). We obtained verbalization patterns for Luganda and isiXhosa, in the same zones as Runyankore and isiZulu respectively, and chiShona, Kikuyu, and Kinyarwanda from different zones, and used the bootstrap metric that we developed to identify the most efficient source—target bootstrap pair. By regrouping Meinhof’s noun class system we were able to eliminate non-determinism during computation, and this led to the development of a generic noun pluralizer. We also showed that CFGs can conjugate verbs in the five additional languages. Finally, we proposed the architecture for an API that could be used to generate text in agglutinating Bantu languages. Our research provides a method for surface realization for an under-resourced and grammatically complex family of languages, Bantu languages. We leave the development of a complete NLG system based on the Runyankore grammar engine and of the API as areas for future work

    Initial vowel agglutination in the Gulf of Guinea creoles

    Get PDF
    Reinterpretation of morpheme boundaries is a well-attested phenomenon in contact linguistics and language-internal diachronic change. Examples of agglutination have been noted in a wide array of creole languages (e.g. Holm 1988: 97; Parkvall 2000; 81-3) and especially in French-based creoles (e.g. Baker 1984, Grant 1995). This paper focuses on the Gulf of Guinea creoles (GGCs), where a number of etymologically consonant-initial words in the lexifier language, Portuguese, exhibit an agglutinated vowel lacking a morphological function. This property is particularly common in Lung’ie (Principense Creole). My aim is to answer the following interrelated questions: (i) Is there evidence for diachronic layering of agglutination in the GGCs? (ii) What are the workings that underlie agglutination in the GGCs? (iii) What are the origins of agglutination in the GGCs?info:eu-repo/semantics/publishedVersio

    Towards a Resource Grammar for Runyankore and Rukiga

    Get PDF
    Currently, there is a lack of computational grammar resources for many under-resourced languages which limits the ability to develop Natural Language Processing (NLP) tools and applications such as Multilingual Document Authoring, Computer-Assisted Language Learning (CALL) and Low-Coverage Machine Translation (MT) for these languages. In this paper, we present our attempt to formalise the grammar of two such languages: Runyankore and Rukiga. For this formalisation we use the Grammatical Framework (GF) and its Resource Grammar Library (GF-RGL)

    Lexical and Grammar Resource Engineering for Runyankore & Rukiga: A Symbolic Approach

    Get PDF
    Current research in computational linguistics and natural language processing (NLP) requires the existence of language resources. Whereas these resources are available for a few well-resourced languages, there are many languages that have been neglected. Among the neglected and / or under-resourced languages are Runyankore and Rukiga (henceforth referred to as Ry/Rk). Recently, the NLP community has started to acknowledge that resources for under-resourced languages should also be given priority. Why? One reason being that as far as language typology is concerned, the few well-resourced languages do not represent the structural diversity of the remaining languages. The central focus of this thesis is about enabling the computational analysis and generation of utterances in Ry/Rk. Ry/Rk are two closely related languages spoken by about 3.4 and 2.4 million people respectively. They belong to the Nyoro-Ganda (JE10) language zone of the Great Lakes, Narrow Bantu of the Niger-Congo language family.The computational processing of these languages is achieved by formalising the grammars of these two languages using Grammatical Framework (GF) and its Resource Grammar Library (RGL). In addition to the grammar, a general-purpose computational lexicon for the two languages is developed. Although we utilise the lexicon to tremendously increase the lexical coverage of the grammars, the lexicon can be used for other NLP tasks.In this thesis a symbolic / rule-based approach is taken because the lack of adequate languages resources makes the use of data-driven NLP approaches unsuitable for these languages

    A linguistic analysis of Rukiga personal names

    Get PDF
    The goal of the paper is to provide a linguistic description of the structure of personal names in a lesser studied Bantu language of Uganda, Rukiga (JE14). Data show that Rukiga personal names are presented as lexical entities but with underlying elaborate grammatical structures derived from the syntax, morphology, phonology and the lexicon of the language. Personal names in Rukiga form a special category of nouns derived from nouns, adjectives, verbs, phrases, clauses and full sentences. This study establishes that truncation, affixal derivation, lexicalization of phrases, clauses and sentences are employed in name-formation. The study further reveals that the socio-cultural context influences the semantics and structure of names in Rukiga. Data for this study were collected in Kabale district in western Uganda through interviewing older persons, reviewing religious documents and tax collection registers. The study mirrors personal names as a part of the grammar of Rukiga reflecting the general complex linguistic system of the language. Data from this study is envisaged to contribute to typological and theoretical analyses of personal names which have internal morphosyntactic properties

    Diteng tsa ditlhopha tsa maina a Bantu: ntlhathakanelo e le mo Setswanang : “The semantics of Bandu noun classes: a focus on Setswana

    Get PDF
    The present study investigated the semantic classification of the Setswana noun class system. This enquiry falls under the broad area of the noun classification system in Bantu languages, psycholinguistics and lexicogrpahy. Specifically it explores the basis of noun classification in Setswana making indications that Setswana noun classification is based on a partial semantic classification. Data for the study was drawn from the Setswana Oxford Dictionary. Sixty Setswana nouns, from class 1, 3, 5, and 7, were selected and analysed and then grouped into semantic categories (i.e., PERSON, DEROGATION, TRANSPORATION and so forth). The study adopted Kgukutli’s (1994) semantic classification in performing the dictionary analysis. The rest of the data was drawn from the intuitions of thirty-nine contemporary speakers of Setswana, with the aid of a linguistic test which was fashioned according to Selvik’s (2001) psycholinguistic test. The language test required participants to match the predetermined Setswana definitions with hypothetical Setswana nouns with selected class prefixes attached to them. The results from the empirical study showed that speakers were associating prefixes to certain semantic values, suggesting that each noun class had specific semantic content that was unique to that class. The semantic categories created through the dictionary analysis were then compared to those given by the thirty-nine Setswana speakers, to analyse whether there were any similaritires in the semantic classification of the noun classes. The findings of the dictionary analysis and linguistic test revealed that there were certain semantic characteristics that each class was associated with that seemed to be unique to the class. However, there were various semantic overlaps in the semantic categories associated with the different noun classes, which brings into question whether a semantic classification is viable in the classing of nouns. The study suggests that prior classification of Setswana nouns are not precise enough and that additional semantic categories are needed to offer a more precise classification of nouns in this language

    A comparative study of syllables and morphemes as literacy processing units in word recognition: IsiXhosa and SeTswana

    Get PDF
    Word recognition is a core foundation of reading (Invenizzi and Hayes 2010) and involves interactions of language skills, metalinguistic skills and orthography. The extent of the interaction with one another in reading has yet to be fully explored, especially in the Southern-Bantu languages. This comparative study of isiXhosa and Setswana explores this three-way interaction between language skills (effect of Language of Learning and Teaching (LoLT)), metalinguistic skills (Phonological and Morphological Awareness) and orthography (conjunctivism vs. disjunctivism). This thesis is novel in three respects, (a) a set of linguistic-informed reading measures were developed in isiXhosa and Setswana for the first-time, (b) to my knowledge, the comparisons made and study of Morphological Awareness in the Southern-Bantu languages have never been done, and (c) the use of d-prime as a way of testing for grain size in reading is an innovative approach. Grade 3 and Grade 4 learners were tested on four independent linguistic tasks: an open-ended decomposition task, a Phonological Awareness task, a Morphological Awareness task and an independent reading measure. These tasks were administered to determine the grain size unit (Ziegler and Goswami 2005, Ziegler et al. 2001) which learners use in word recognition, with the grain sizes of syllables and morphemes being studied. Results showed that syllables were the dominant grain size in both isiXhosa and Setswana, with morphemes as secondary grains in isiXhosa. Grain size differed slightly between the two orthographies. These results are reflected in the scores on the metalinguistic tasks. LoLT was not shown to have a significant impact on word recognition in first-language reading. The Psycholinguistic Grain Size Theory (PGST) was found to be the most applicable model of word recognition to the Southern- Bantu languages, as opposed to the Dual-Route Cascade Model and Orthographic Depth Hypothesis. This thesis concludes with suggested adaptations to this theory in order to allow for morpheme grain size to be included. This study has implications for teaching practice and curriculum design, and contributes to a broader understanding of literacy in the foundation phase in the Southern-Bantu languages

    Proper Names and Common Nouns Dissociation: Exploring Differences in Linguistic Processing and Memory Retrieval

    Get PDF
    Tese de mestrado, Ciência Cognitiva, 2022, Universidade de Lisboa, Faculdade de CiênciasPhilosophy and linguistics suggest that proper names and common nouns are dissociate lexicosemantic categories. Evidence from psychology and neuropsychology honours this distinction as it provides indications that they may activate different neuro-functional systems. Nevertheless, there are still some lacks in the literature that must be filled. There are mixed findings about the temporal pole involvement in proper names retrieval. Furthermore, to our knowledge, no study has yet investigated the dissociation of proper names vs. common nouns in light of the welldocumented oscillatory dissociation of episodic theta and semantic alpha as reflecting the distinct declarative memory requirements. Besides, no study has explored the brain-based dissociation between the two categories using images as a stimulus. Our naming task showed that there is a dissociation in the retrieval of proper names being more demanding and source-consuming compared to common nouns. Also, oscillations patterns revealed a more pronounced evoked theta power in the proper names retrieval condition in comparison to the common nouns condition. For the alpha wave, we did not obtain differences between the categories. These results sustain the claim of the existence of functionally and anatomically distinct retrieval pathways for the categories of proper and common names, and thus, a dissociation between proper names and common nouns

    Formal and semantic properties of the Gujjolaay Eegimaa (a.k.a. Banjal) nominal classification system.

    Get PDF
    Gujjolaay Eegimaa (G.E.), an Atlantic language of the Niger-Congo phylum spoken in the Basse-Casamance area in Senegal, exhibits a system of nominal classification known as a "gender/ noun class system". In this type of nominal classification system which is prevalent in Niger-Congo languages, there is controversy as to whether the obligatory classification of all nouns into a finite number of classes has semantic motivations. In addition to the disputed issue of the semantic basis of the nominal classification, the formal criteria for assigning nouns into classes are also disputed in Joola languages and in G.E. In this PhD thesis, I propose an investigation of the formal and semantic properties of the nominal classification system of Gujjolaay Eegimaa (G.E). Based on cross-linguistic and language-specific research, I propose formal criteria whose application led to the discovery of fifteen noun classes in G.E. Here, I argue that the G.E. noun class system has semantic motivations. I show that some nouns in this language may be classified or categorized on the basis of shared properties as stipulated in the classical theory of categorization. However, most of the classification of the G.E. nouns is based on prototypicality and extension of such prototypes by family resemblance, chaining process, metaphor and metonymy, as argued in the prototype theory from cognitive semantics. The parameters of categorization that fruitfully account for the semantic basis of the G.E. nominal classification system are both universal and cultural-specific. Primary data constitutes the material used in this research and include lexical (including loanwords), textual as well as experimental data using picture stimuli. The collected data comprise different types of communicative events recorded in audio and video formats and also in written format through participant observation
    corecore