Search CORE

4 research outputs found

ToCT: A task ontology to manage complex templates

Author: Keet CM
Mahlaza Z
Publication venue
Publication date: 01/01/2021
Field of study

Natural language interfaces are a well-known approach to grant non-experts access to semantic web technologies. A number of such systems use simple templates to achieve that for English and more elab-orate solutions for other languages. They keep being designed from scratch in an ad hoc manner, since there is no shared conceptualisation of simple templates and there is no model that is formalised using a Semantic Web language to apply the techniques to itself. We aim to address this by proposing a general-purpose solution in the form of a novel model for templates, formalised as a task ontology in OWL,calledToCT. We used it to develop an ontology-driven text generator for isiZulu, a morphologically-rich language, to test its capabilities. The generator verbalises the TBox of an ontology as validationq uestions. This evaluation showed that the task ontology is sufficiently expressive for the template design, which was subsequently verified with user evaluations who judged the texts positivel

UCT Computer Science Research Document Archive

Lexical and Grammar Resource Engineering for Runyankore & Rukiga: A Symbolic Approach

Author: Bamutura David
Publication venue
Publication date: 01/01/2021
Field of study

Current research in computational linguistics and natural language processing (NLP) requires the existence of language resources. Whereas these resources are available for a few well-resourced languages, there are many languages that have been neglected. Among the neglected and / or under-resourced languages are Runyankore and Rukiga (henceforth referred to as Ry/Rk). Recently, the NLP community has started to acknowledge that resources for under-resourced languages should also be given priority. Why? One reason being that as far as language typology is concerned, the few well-resourced languages do not represent the structural diversity of the remaining languages. The central focus of this thesis is about enabling the computational analysis and generation of utterances in Ry/Rk. Ry/Rk are two closely related languages spoken by about 3.4 and 2.4 million people respectively. They belong to the Nyoro-Ganda (JE10) language zone of the Great Lakes, Narrow Bantu of the Niger-Congo language family.The computational processing of these languages is achieved by formalising the grammars of these two languages using Grammatical Framework (GF) and its Resource Grammar Library (RGL). In addition to the grammar, a general-purpose computational lexicon for the two languages is developed. Although we utilise the lexicon to tremendously increase the lexical coverage of the grammars, the lexicon can be used for other NLP tasks.In this thesis a symbolic / rule-based approach is taken because the lack of adequate languages resources makes the use of data-driven NLP approaches unsuitable for these languages

Chalmers Research

Grammars for generating isiXhosa and isiZulu weather bulletin verbs

Author: Mahlaza Zola
Publication venue: Department of Computer Science
Publication date: 01/01/2018
Field of study

The Met Office has investigated the use of natural language generation (NLG) technologies to streamline the production of weather forecasts. Their approach would be of great benefit in South Africa because there is no fast and large scale producer, automated or otherwise, of textual weather summaries for Nguni languages. This is because of, among other things, the complexity of Nguni languages. The structure of these languages is very different from Indo-European languages, and therefore we cannot reuse existing technologies that were developed for the latter group. Traditional NLG techniques such as templates are not compatible with 'Bantu' languages, and existing works that document scaled-down 'Bantu' language grammars are also not sufficient to generate weather text. In pursuance of generating weather text in isiXhosa and isiZulu - we restricted our text to only verbs in order to ensure a manageable scope. In particular, we have developed a corpus of weather sentences in order to determine verb features. We then created context free verbal grammar rules using an incremental approach. The quality of these rules was evaluated using two linguists. We then investigated the grammatical similarity of isiZulu verbs with their isiXhosa counterparts, and the extent to which a singular merged set of grammar rules can be used to produce correct verbs for both languages. The similarity analysis of the two languages was done through the developed rules' parse trees, and by applying binary similarity measures on the sets of verbs generated by the rules. The parse trees show that the differences between the verb's components are minor, and the similarity measures indicate that the verb sets are at most 59.5% similar (Driver-Kroeber metric). We also examined the importance of the phonological conditioning process by developing functions that calculate the ratio of verbs that will require conditioning out of the total strings that can be generated. We have found that the phonological conditioning process affects at least 45% of strings for isiXhosa, and at least 67% of strings for isiZulu depending on the type of verb root that is used. Overall, this work shows that the differences between isiXhosa and isiZulu verbs are minor, however, the exploitation of these similarities for the goal of creating a unified rule set for both languages cannot be achieved without significant maintainability compromises because there are dependencies that exist in one language and not the other between the verb's 'modules'. Furthermore, the phonological conditioning process should be implemented in order to improve generated text due to the high ratio of verbs it affects

Cape Town University OpenUCT