55 research outputs found

    Bootstrapping Lexical Choice via Multiple-Sequence Alignment

    Get PDF
    An important component of any generation system is the mapping dictionary, a lexicon of elementary semantic expressions and corresponding natural language realizations. Typically, labor-intensive knowledge-based methods are used to construct the dictionary. We instead propose to acquire it automatically via a novel multiple-pass algorithm employing multiple-sequence alignment, a technique commonly used in bioinformatics. Crucially, our method leverages latent information contained in multi-parallel corpora -- datasets that supply several verbalizations of the corresponding semantics rather than just one. We used our techniques to generate natural language versions of computer-generated mathematical proofs, with good results on both a per-component and overall-output basis. For example, in evaluations involving a dozen human judges, our system produced output whose readability and faithfulness to the semantic input rivaled that of a traditional generation system.Comment: 8 pages; to appear in the proceedings of EMNLP-200

    Ontology verbalization in agglutinating Bantu languages: a study of Runyankore and its generalizability

    Get PDF
    Natural Language Generation (NLG) systems have been developed to generate text in multiple domains, including personalized patient information. However, their application is limited in Africa because they generate text in English, yet indigenous languages are still predominantly spoken throughout the continent, especially in rural areas. The existing healthcare NLG systems cannot be reused for Bantu languages due to the complex grammatical structure, nor can the generated text be used in machine translation systems for Bantu languages because they are computationally under-resourced. This research aimed to verbalize ontologies in agglutinating Bantu languages. We had four research objectives: (1) noun pluralization and verb conjugation in Runyankore; (2) Runyankore verbalization patterns for the selected description logic constructors; (3) combining the pluralization, conjugation, and verbalization components to form a Runyankore grammar engine; and (4) generalizing the Runyankore and isiZulu approaches to ontology verbalization to other agglutinating Bantu languages. We used an approach that combines morphology with syntax and semantics to develop a noun pluralizer for Runyankore, and used Context-Free Grammars (CFGs) for verb conjugation. We developed verbalization algorithms for eight constructors in a description logic. We then combined these components into a grammar engine developed as a ProtĂ©gĂ©5X plugin. The investigation into generalizability used the bootstrap approach, and investigated bootstrapping for languages in the same language zone (intra-zone bootstrappability) and languages across language zones (inter-zone bootstrappability). We obtained verbalization patterns for Luganda and isiXhosa, in the same zones as Runyankore and isiZulu respectively, and chiShona, Kikuyu, and Kinyarwanda from different zones, and used the bootstrap metric that we developed to identify the most efficient source—target bootstrap pair. By regrouping Meinhof’s noun class system we were able to eliminate non-determinism during computation, and this led to the development of a generic noun pluralizer. We also showed that CFGs can conjugate verbs in the five additional languages. Finally, we proposed the architecture for an API that could be used to generate text in agglutinating Bantu languages. Our research provides a method for surface realization for an under-resourced and grammatically complex family of languages, Bantu languages. We leave the development of a complete NLG system based on the Runyankore grammar engine and of the API as areas for future work

    Three Approaches to Generating Texts in Different Styles

    Get PDF
    Natural Language Generation (nlg) systems generate texts in English and other human languages from non-linguistic input data. Usually there are a large number of possible texts that can communicate the input data, and nlg systems must choose one of these. We argue that style can be used by nlg systems to choose between possible texts, and explore how this can be done by (1) explicit stylistic parameters, (2) imitating a genre style, and (3) imitating an individual’s style

    User Interfaces to the Web of Data based on Natural Language Generation

    Get PDF
    We explore how Virtual Research Environments based on Semantic Web technologies support research interactions with RDF data in various stages of corpus-based analysis, analyze the Web of Data in terms of human readability, derive labels from variables in SPARQL queries, apply Natural Language Generation to improve user interfaces to the Web of Data by verbalizing SPARQL queries and RDF graphs, and present a method to automatically induce RDF graph verbalization templates via distant supervision

    DFKI Workshop on Natural Language Generation

    Get PDF
    On the SaarbrĂŒcken campus sites as well as at DFKI, many research activities are pursued in the field of Natural Language Generation (NLG). We felt that too little is known about the total of these activities and decided to organize a workshop in order to share ideas and promote the results. This DFKI workshop brought together local researchers working on NLG. Several papers are co-authored by international researchers. Although not all NLG activities are covered in the present document, the papers reviewed for this workshop clearly demonstrate that SaarbrĂŒcken counts among the important NLG sites in the world

    DFKI Workshop on Natural Language Generation

    Get PDF
    On the SaarbrĂŒcken campus sites as well as at DFKI, many research activities are pursued in the field of Natural Language Generation (NLG). We felt that too little is known about the total of these activities and decided to organize a workshop in order to share ideas and promote the results. This DFKI workshop brought together local researchers working on NLG. Several papers are co-authored by international researchers. Although not all NLG activities are covered in the present document, the papers reviewed for this workshop clearly demonstrate that SaarbrĂŒcken counts among the important NLG sites in the world

    The use of data-mining for the automatic formation of tactics

    Get PDF
    This paper discusses the usse of data-mining for the automatic formation of tactics. It was presented at the Workshop on Computer-Supported Mathematical Theory Development held at IJCAR in 2004. The aim of this project is to evaluate the applicability of data-mining techniques to the automatic formation of tactics from large corpuses of proofs. We data-mine information from large proof corpuses to find commonly occurring patterns. These patterns are then evolved into tactics using genetic programming techniques

    Natural Language Generation Requirements for Social Robots in Sub-Saharan Africa

    Get PDF
    Robots are deployed in Africa mainly in manufacturing, yet they may assist in society as future oriented technologies as well. They may ameliorate, e.g., service delivery issues and skills shortages. In this discussion paper, several uses and use cases relevant to Sub-Saharan Africa are described and requirements identified. We zoom in on human-robot interaction in Niger-Congo B (‘bantu’) languages. Use cases for healthcare and education elucidate specific requirements for the natural language generation component of robots in society. In contrast to typical generation systems, it demands i) combining data-to-text and knowledge-to-text in one system, ii) generating different types of sentences so as to switch between written and spoken language, and iii) processing non-trivial numbers
    • 

    corecore