17 research outputs found

    Textual Economy through Close Coupling of Syntax and Semantics

    Get PDF
    We focus on the production of efficient descriptions of objects, actions and events. We define a type of efficiency, textual economy, that exploits the hearer's recognition of inferential links to material elsewhere within a sentence. Textual economy leads to efficient descriptions because the material that supports such inferences has been included to satisfy independent communicative goals, and is therefore overloaded in Pollack's sense. We argue that achieving textual economy imposes strong requirements on the representation and reasoning used in generating sentences. The representation must support the generator's simultaneous consideration of syntax and semantics. Reasoning must enable the generator to assess quickly and reliably at any stage how the hearer will interpret the current sentence, with its (incomplete) syntax and semantics. We show that these representational and reasoning requirements are met in the SPUD system for sentence planning and realization.Comment: 10 pages, uses QobiTree.te

    Individual and Domain Adaptation in Sentence Planning for Dialogue

    Full text link
    One of the biggest challenges in the development and deployment of spoken dialogue systems is the design of the spoken language generation module. This challenge arises from the need for the generator to adapt to many features of the dialogue domain, user population, and dialogue context. A promising approach is trainable generation, which uses general-purpose linguistic knowledge that is automatically adapted to the features of interest, such as the application domain, individual user, or user group. In this paper we present and evaluate a trainable sentence planner for providing restaurant information in the MATCH dialogue system. We show that trainable sentence planning can produce complex information presentations whose quality is comparable to the output of a template-based generator tuned to this domain. We also show that our method easily supports adapting the sentence planner to individuals, and that the individualized sentence planners generally perform better than models trained and tested on a population of individuals. Previous work has documented and utilized individual preferences for content selection, but to our knowledge, these results provide the first demonstration of individual preferences for sentence planning operations, affecting the content order, discourse structure and sentence structure of system responses. Finally, we evaluate the contribution of different feature sets, and show that, in our application, n-gram features often do as well as features based on higher-level linguistic representations

    On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech

    Full text link
    Advanced neural network models have penetrated Automatic Speech Recognition (ASR) in recent years, however, in language modeling many systems still rely on traditional Back-off N-gram Language Models (BNLM) partly or entirely. The reason for this are the high cost and complexity of training and using neural language models, mostly possible by adding a second decoding pass (rescoring). In our recent work we have significantly improved the online performance of a conversational speech transcription system by transferring knowledge from a Recurrent Neural Network Language Model (RNNLM) to the single pass BNLM with text generation based data augmentation. In the present paper we analyze the amount of transferable knowledge and demonstrate that the neural augmented LM (RNN-BNLM) can help to capture almost 50% of the knowledge of the RNNLM yet by dropping the second decoding pass and making the system real-time capable. We also systematically compare word and subword LMs and show that subword-based neural text augmentation can be especially beneficial in under-resourced conditions. In addition, we show that using the RNN-BNLM in the first pass followed by a neural second pass, offline ASR results can be even significantly improved.Comment: 8 pages, 2 figures, accepted for publication at TSD 202

    Sentence-Level Content Planning and Style Specification for Neural Text Generation

    Full text link
    Building effective text generation systems requires three critical components: content selection, text planning, and surface realization, and traditionally they are tackled as separate problems. Recent all-in-one style neural generation models have made impressive progress, yet they often produce outputs that are incoherent and unfaithful to the input. To address these issues, we present an end-to-end trained two-step generation model, where a sentence-level content planner first decides on the keyphrases to cover as well as a desired language style, followed by a surface realization decoder that generates relevant and coherent text. For experiments, we consider three tasks from domains with diverse topics and varying language styles: persuasive argument construction from Reddit, paragraph generation for normal and simple versions of Wikipedia, and abstract generation for scientific articles. Automatic evaluation shows that our system can significantly outperform competitive comparisons. Human judges further rate our system generated text as more fluent and correct, compared to the generations by its variants that do not consider language style.Comment: Accepted as a long paper to EMNLP 201

    Un dictionnaire de régimes verbaux en mandarin

    Full text link
    Ce mémoire s’insère dans le projet GenDR, un réalisateur de texte profond multilingue qui modélise l’interface sémantique-syntaxe pour la génération automatique de texte (GAT). Dans le cadre de la GAT, les ressources lexicales sont de première nécessité pour que le système puisse transformer des données nonlinguistiques en langage naturel. Ces ressources lexicales déterminent dans une certaine mesure la précision et la flexibilité des phrases générées. En raison de l’imprévisibilité du régime des verbes et du rôle central que les verbes jouent dans un énoncé, une ressource lexicale qui décrit le régime des verbes revêt une importance particulière pour générer du texte le plus précis et le plus naturel possible. Nous avons tenté de créer un dictionnaire de régimes verbaux en mandarin. Ce genre de ressource lexicale est toujours une lacune dans le domaine de la GAT en mandarin. En nous basant sur la base de données Mandarin VerbNet, nous avons eu recours à Python pour extraire les adpositions régies et créer notre dictionnaire. Il s’agit d’un dictionnaire dynamique, dont le contenu peut être paramétré en fonction des objectifs de l’utilisateur.This work fits into the GenDR project, a multilingual deep realizer which models the semantics-syntax interface for natural language generation (NLG). In NLG, lexical resources are essential to transform non-linguistic data into natural language. To a certain extent, the lexical resources used determine the accuracy and flexibility of the sentences generated by a realizer. Due to the unpredictability of verbs’ syntactic behaviour and the central role that verbs play in an utterance, a lexical resource which describes the government patterns of verbs is key to generating the most precise and natural text possible. We aim to create a dictionary of verbs’ government patterns in Mandarin. This kind of lexical resource is still missing for NLG in Mandarin. Based on the Mandarin VerbNet database, we used Python to extract information about adpositions and to create our dictionary. This is a dynamic dictionary whose content can be parameterized according to the user’s needs

    Microplanning with Communicative Intentions: The SPUD System

    Get PDF
    The process of microplanning in Natural Language Generation (NLG) encompasses a range of problems in which a generator must bridge underlying domain-specific representations and general linguistic representations. These problems include constructing linguistic referring expressions to identify domain objects, selecting lexical items to express domain concepts, and using complex linguistic constructions to concisely convey related domain facts. In this paper, we argue that such problems are best solved through a uniform, comprehensive, declarative process. In our approach, the generator directly explores a search space for utterances described by a linguistic grammar. At each stage of search, the generator uses a model of interpretation, which characterizes the potential links between the utterance and the domain and context, to assess its progress in conveying domain-specific representations. We further address the challenges for implementation and knowledge representation in this approach. We show how to implement this approach effectively by using the lexicalized tree-adjoining grammar formalism (LTAG) to connect structure to meaning and using modal logic programming to connect meaning to context. We articulate a detailed methodology for designing grammatical and conceptua

    Language generation module for conversational systems

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (p. 127-132).by Lauren M. Baptist.Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000
    corecore