286 research outputs found

    Finite-State Computational Morphology: An Analyzer Prototype for Zulu

    Get PDF
    As one of the largest of the 11 official languages of South Africa, Zulu is spoken by approximately 9 million people. It forms part of a language family which is characterized by rich agglutinating morphological structures. This paper discusses a prototype of a computational morphological analyzer for Zulu, built by means of the Xerox finite state tools, in particular lexc and xfst. In addition to considering both the morphotactics and the morphophonological alternation rules that apply, the focus is on implementation and other issues that need to be resolved in order to produce a useful software artefact for automated morphological analysis. The current status of the prototype is alluded to by providing morphological scope, that is the various word categories (parts of speech) that may be handled, and the lexical coverage in terms of the number of different Zulu roots that are included in the embedded lexicon of the analyzer. Preliminary testing and validation procedures are briefly discussed.African Language

    Corpus-driven Bantu Lexicography, part 2 : lemmatisation and rulers for Lusoga

    Get PDF
    This article is the second in a trilogy that deals with corpus-driven Bantu lexicography, which is illustrated for Lusoga. The focus here is on the macrostructure and in particular on the building of a lemmatised frequency list directly within a dictionary-writing system. The programming code for the parts of the lemmatisation that may be automated is included as addenda. A second focus is on the embedded part-of-speech and alphabetical rulers, for which it is shown how these may be used to plan the actual compilation of the dictionary entries

    Grammar rules for the isiZulu complex verb

    Get PDF
    The isiZulu verb is known for its morphological complexity, which is a subject of on-going linguistics research, as well as for prospects of computational use, such as controlled natural language interfaces, machine translation, and spellcheckers. To this end, we seek to answer the question as to what the precise grammar rules for the isiZulu complex verb are (and, by extension, the Bantu verb morphology). To this end, we iteratively specify the grammar as a Context Free Grammar, and evaluate it computationally. The grammar presented in this paper covers the subject and object concords, negation, present tense, aspect, mood, and the causative, applicative, stative, and the reciprocal verbal extensions, politeness, the wh-question modifiers, and aspect doubling, ensuring their correct order as they appear in verbs. The grammar conforms to specification

    Selected papers from the 49th Annual Conference on African Linguistics

    Get PDF
    Descriptive and Theoretical Approaches to African Linguistics contains a selection of revised and peer-reviewed papers from the 49th Annual Conference on African Linguistics, held at Michigan State University in 2018. The contributions from both students and more senior scholars, based in North America, Africa and other parts of the world, provide a glimpse of the breadth and quality of current research in African linguistics from both descriptive and theoretical perspectives. Fields of interest range from phonetics, phonology, morphology, syntax, semantics to sociolinguistics, historical linguistics, discourse analysis, language documentation, computational linguistics and beyond. The articles reflect both the typological and genetic diversity of languages in Africa and the wide range of research areas covered by presenters at ACAL conferences

    UniMorph 4.0:Universal Morphology

    Get PDF
    The Universal Morphology (UniMorph) project is a collaborative effort providing broad-coverage instantiated normalized morphological inflection tables for hundreds of diverse world languages. The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema. This paper presents the expansions and improvements made on several fronts over the last couple of years (since McCarthy et al. (2020)). Collaborative efforts by numerous linguists have added 67 new languages, including 30 endangered languages. We have implemented several improvements to the extraction pipeline to tackle some issues, e.g. missing gender and macron information. We have also amended the schema to use a hierarchical structure that is needed for morphological phenomena like multiple-argument agreement and case stacking, while adding some missing morphological features to make the schema more inclusive. In light of the last UniMorph release, we also augmented the database with morpheme segmentation for 16 languages. Lastly, this new release makes a push towards inclusion of derivational morphology in UniMorph by enriching the data and annotation schema with instances representing derivational processes from MorphyNet

    UniMorph 4.0:Universal Morphology

    Get PDF
    corecore