6 research outputs found

    Basics for a grammar engine to verbalize logical theories in isiZulu

    Get PDF
    The language isiZulu is the largest in South Africa by numbers of first language speakers, yet, it is still an underresourced language. In this paper, we approach the grammar piecemeal from a natural language generation approach, and viewed from a potential utility for verbalizing OWL ontologies as a tangible use case. The elaborate rules of the grammar show that a grammar engine and dictionary is essential even for basic verbalizations in OWL 2 EL. This is due to, mainly, the 17 noun classes with embedded semantics and the agglutinative nature of isiZulu. The verbalization of basic constructs requires merging a prefix with a noun and distinguishing an `and' between a list and linking clauses

    Contextualising Levels of Language Resourcedness affecting Digital Processing of Text

    Get PDF
    Application domains such as digital humanities and tool like chatbots involve some form of processing natural language, from digitising hardcopies to speech generation. The language of the content is typically characterised as either a low resource language (LRL) or high resource language (HRL), also known as resource-scarce and well-resourced languages, respectively. African languages have been characterized as resource-scarce languages (Bosch et al. 2007; Pretorius & Bosch 2003; Keet & Khumalo 2014) and English is by far the most well-resourced language. Varied language resources are used to develop software systems for these languages to accomplish a wide range of tasks. In this paper we argue that the dichotomous typology LRL and HRL for all languages is problematic. Through a clear understanding of language resources situated in a society, a matrix is developed that characterizes languages as Very LRL, LRL, RL, HRL and Very HRL. The characterization is based on the typology of contextual features for each category, rather than counting tools, and motivation is provided for each feature and each characterization. The contextualisation of resourcedness, with a focus on African languages in this paper, and an increased understanding of where on the scale the language used in a project is, may assist in, among others, better planning of research and implementation projects. We thus argue in this paper that the characterization of language resources within a given scale in a project is an indispensable component particularly in the context of low-resourced languages

    Grammar rules for the isiZulu complex verb

    Get PDF
    The isiZulu verb is known for its morphological complexity, which is a subject of on-going linguistics research, as well as for prospects of computational use, such as controlled natural language interfaces, machine translation, and spellcheckers. To this end, we seek to answer the question as to what the precise grammar rules for the isiZulu complex verb are (and, by extension, the Bantu verb morphology). To this end, we iteratively specify the grammar as a Context Free Grammar, and evaluate it computationally. The grammar presented in this paper covers the subject and object concords, negation, present tense, aspect, mood, and the causative, applicative, stative, and the reciprocal verbal extensions, politeness, the wh-question modifiers, and aspect doubling, ensuring their correct order as they appear in verbs. The grammar conforms to specification

    Basics for a Grammar Engine to Verbalize Logical Theories in isiZulu

    No full text

    Grammars for generating isiXhosa and isiZulu weather bulletin verbs

    Get PDF
    The Met Office has investigated the use of natural language generation (NLG) technologies to streamline the production of weather forecasts. Their approach would be of great benefit in South Africa because there is no fast and large scale producer, automated or otherwise, of textual weather summaries for Nguni languages. This is because of, among other things, the complexity of Nguni languages. The structure of these languages is very different from Indo-European languages, and therefore we cannot reuse existing technologies that were developed for the latter group. Traditional NLG techniques such as templates are not compatible with 'Bantu' languages, and existing works that document scaled-down 'Bantu' language grammars are also not sufficient to generate weather text. In pursuance of generating weather text in isiXhosa and isiZulu - we restricted our text to only verbs in order to ensure a manageable scope. In particular, we have developed a corpus of weather sentences in order to determine verb features. We then created context free verbal grammar rules using an incremental approach. The quality of these rules was evaluated using two linguists. We then investigated the grammatical similarity of isiZulu verbs with their isiXhosa counterparts, and the extent to which a singular merged set of grammar rules can be used to produce correct verbs for both languages. The similarity analysis of the two languages was done through the developed rules' parse trees, and by applying binary similarity measures on the sets of verbs generated by the rules. The parse trees show that the differences between the verb's components are minor, and the similarity measures indicate that the verb sets are at most 59.5% similar (Driver-Kroeber metric). We also examined the importance of the phonological conditioning process by developing functions that calculate the ratio of verbs that will require conditioning out of the total strings that can be generated. We have found that the phonological conditioning process affects at least 45% of strings for isiXhosa, and at least 67% of strings for isiZulu depending on the type of verb root that is used. Overall, this work shows that the differences between isiXhosa and isiZulu verbs are minor, however, the exploitation of these similarities for the goal of creating a unified rule set for both languages cannot be achieved without significant maintainability compromises because there are dependencies that exist in one language and not the other between the verb's 'modules'. Furthermore, the phonological conditioning process should be implemented in order to improve generated text due to the high ratio of verbs it affects

    Ontological Model for Xhosa Beadwork in Marginalised Rural Communities: A Case of the Eastern Cape

    Get PDF
    In South Africa, computational ontologies have gained traction and are increasingly viewed as one of the viable solutions to address the problem of fragmented and unstructured nature of indigenous knowledge (IK) particularly in the marginalized rural communities. The continual existence of IK in tacit form has impeded the use of IK as a potential resource that can catalyze socio-economic and cultural development in South Africa. This study was, therefore, designed to address part of this challenge by developing a Xhosa Beadwork Ontology (XBO) with the goal of structuring the domain knowledge into a reusable body of knowledge. Such a reusable body of knowledge promotes efficient sharing of a common understanding of Xhosa Beadwork in a computational form. The XBO is in OWL 2 DL. The development of the XBO was informed by the NeOn methodology and the iterativeincremental ontology development life cycle within the ambit of Action Research (AR). The XBO was developed around personal ornamentation Xhosa Beadwork consisting of Necklace, Headband, Armlet, Waistband, Bracelet, and Anklet. In this study, the XBO was evaluated focused on ascertaining that the created ontology is a comprehensive representation of the Xhosa Beadwork and is of the required standard. In addition, the XBO was documented into a human understandable and readable resource and was published. The outcome of the study has indicated that the XBO is an adequate, shareable and reusable semantic artifact that can indeed support the formalization and preservation of IK in the domain of Xhosa Beadwor
    corecore