3 research outputs found

    Grammars for generating isiXhosa and isiZulu weather bulletin verbs

    Get PDF
    The Met Office has investigated the use of natural language generation (NLG) technologies to streamline the production of weather forecasts. Their approach would be of great benefit in South Africa because there is no fast and large scale producer, automated or otherwise, of textual weather summaries for Nguni languages. This is because of, among other things, the complexity of Nguni languages. The structure of these languages is very different from Indo-European languages, and therefore we cannot reuse existing technologies that were developed for the latter group. Traditional NLG techniques such as templates are not compatible with 'Bantu' languages, and existing works that document scaled-down 'Bantu' language grammars are also not sufficient to generate weather text. In pursuance of generating weather text in isiXhosa and isiZulu - we restricted our text to only verbs in order to ensure a manageable scope. In particular, we have developed a corpus of weather sentences in order to determine verb features. We then created context free verbal grammar rules using an incremental approach. The quality of these rules was evaluated using two linguists. We then investigated the grammatical similarity of isiZulu verbs with their isiXhosa counterparts, and the extent to which a singular merged set of grammar rules can be used to produce correct verbs for both languages. The similarity analysis of the two languages was done through the developed rules' parse trees, and by applying binary similarity measures on the sets of verbs generated by the rules. The parse trees show that the differences between the verb's components are minor, and the similarity measures indicate that the verb sets are at most 59.5% similar (Driver-Kroeber metric). We also examined the importance of the phonological conditioning process by developing functions that calculate the ratio of verbs that will require conditioning out of the total strings that can be generated. We have found that the phonological conditioning process affects at least 45% of strings for isiXhosa, and at least 67% of strings for isiZulu depending on the type of verb root that is used. Overall, this work shows that the differences between isiXhosa and isiZulu verbs are minor, however, the exploitation of these similarities for the goal of creating a unified rule set for both languages cannot be achieved without significant maintainability compromises because there are dependencies that exist in one language and not the other between the verb's 'modules'. Furthermore, the phonological conditioning process should be implemented in order to improve generated text due to the high ratio of verbs it affects

    Интерлингва в системах машинного перевода для жестовых языков

    Get PDF
    The article reviews those machine translation systems for sign languages that are based on intermediate semantic language (interlingua). The constructing principles of two systems are considered: ZARDOZ (multilingual system for translating spoken language into a number of different sign languages, in particular, Irish, American, Japanese) and Multi-path (system with multiple processing pathways for translating spoken English into American sign language). The architecture of those systems and the requirements stipulated by the peculiarities of sign languages are discussed on conceptual level, without going into mathematical and technical details. The purpose of this article is to contribute to perfect understanding of problems and strategies of formalizing semantics within machine translation systems for sign languages.В статье представлена обзорная характеристика систем машинного перевода, предназначенных для жестовых языков и базирующихся на использовании семантического языка-посредника (интерлингвы). Рассматриваются принципы построения интерлингвы в двух разработках: ZARDOZ (многоязыковая система, ориентированная на ряд жестовых языков, в частности, американский, ирландский, японский) и Multi-path (система с многоходовой архитектурой, предназначенная для перевода с английского языка на американский жестовый язык). Обсуждение архитектуры этих систем и требований, обусловленных спецификой жестовых языков, проводится на содержательном уровне, без углубления в математические и технические детали. Цель статьи: способствовать пониманию проблем и стратегий формализации семантики жестовых языков в рамках систем автоматического перевода

    Hybrid discourse modeling and summarization for a speech-to-speech translation system

    Get PDF
    The thesis discusses two parts of the speech-to-speech translation system VerbMobil: the dialogue model and one of its applications, multilingual summary generation. In connection with the dialogue model, two topics are of special interest: (a) the use of a default unification operation called overlay as the fundamental operation for dialogue management; and (b) an intentional model that is able to describe intentions in dialogue on five levels in a language-independent way. Besides the actual generation algorithm developed, we present a comprehensive evaluation of the summarization functionality. In addition to precision and recall, a new characterization - confabulation - is defined that provides a more precise understanding of the performance of complex natural language processing systems.Die vorliegende Arbeit behandelt hauptsächlich zwei Themen, die für das VerbMobil-System, ein Übersetzungssystem gesprochener Spontansprache, entwickelt wurden: das Dialogmodell und als Applikation die multilinguale Generierung von Ergebnissprotokollen. Für die Dialogmodellierung sind zwei Themen von besonderem Interesse. Das erste behandelt eine in der vorliegenden Arbeit formalisierte Default-Unifikations-Operation namens Overlay, die als fundamentale Operation für Diskursverarbeitung dient. Das zweite besteht aus einem intentionalen Modell, das Intentionen eines Dialogs auf fünf Ebenen in einer sprachunabhängigen Repräsentation darstellt. Neben dem für die Protokollgenerierung entwickelten Generierungsalgorithmus wird eine umfassende Evaluation zur Protokollgenerierungsfunktionalität vorgestellt. Zusätzlich zu "precision" und "recall" wird ein neues Maß - Konfabulation (Engl.: "confabulation") - vorgestellt, das eine präzisere Charakterisierung der Qualität eines komplexen Sprachverarbeitungssystems ermöglicht
    corecore