17 research outputs found
Textual Economy through Close Coupling of Syntax and Semantics
We focus on the production of efficient descriptions of objects, actions and
events. We define a type of efficiency, textual economy, that exploits the
hearer's recognition of inferential links to material elsewhere within a
sentence. Textual economy leads to efficient descriptions because the material
that supports such inferences has been included to satisfy independent
communicative goals, and is therefore overloaded in Pollack's sense. We argue
that achieving textual economy imposes strong requirements on the
representation and reasoning used in generating sentences. The representation
must support the generator's simultaneous consideration of syntax and
semantics. Reasoning must enable the generator to assess quickly and reliably
at any stage how the hearer will interpret the current sentence, with its
(incomplete) syntax and semantics. We show that these representational and
reasoning requirements are met in the SPUD system for sentence planning and
realization.Comment: 10 pages, uses QobiTree.te
Individual and Domain Adaptation in Sentence Planning for Dialogue
One of the biggest challenges in the development and deployment of spoken
dialogue systems is the design of the spoken language generation module. This
challenge arises from the need for the generator to adapt to many features of
the dialogue domain, user population, and dialogue context. A promising
approach is trainable generation, which uses general-purpose linguistic
knowledge that is automatically adapted to the features of interest, such as
the application domain, individual user, or user group. In this paper we
present and evaluate a trainable sentence planner for providing restaurant
information in the MATCH dialogue system. We show that trainable sentence
planning can produce complex information presentations whose quality is
comparable to the output of a template-based generator tuned to this domain. We
also show that our method easily supports adapting the sentence planner to
individuals, and that the individualized sentence planners generally perform
better than models trained and tested on a population of individuals. Previous
work has documented and utilized individual preferences for content selection,
but to our knowledge, these results provide the first demonstration of
individual preferences for sentence planning operations, affecting the content
order, discourse structure and sentence structure of system responses. Finally,
we evaluate the contribution of different feature sets, and show that, in our
application, n-gram features often do as well as features based on higher-level
linguistic representations
On the Effectiveness of Neural Text Generation based Data Augmentation for Recognition of Morphologically Rich Speech
Advanced neural network models have penetrated Automatic Speech Recognition
(ASR) in recent years, however, in language modeling many systems still rely on
traditional Back-off N-gram Language Models (BNLM) partly or entirely. The
reason for this are the high cost and complexity of training and using neural
language models, mostly possible by adding a second decoding pass (rescoring).
In our recent work we have significantly improved the online performance of a
conversational speech transcription system by transferring knowledge from a
Recurrent Neural Network Language Model (RNNLM) to the single pass BNLM with
text generation based data augmentation. In the present paper we analyze the
amount of transferable knowledge and demonstrate that the neural augmented LM
(RNN-BNLM) can help to capture almost 50% of the knowledge of the RNNLM yet by
dropping the second decoding pass and making the system real-time capable. We
also systematically compare word and subword LMs and show that subword-based
neural text augmentation can be especially beneficial in under-resourced
conditions. In addition, we show that using the RNN-BNLM in the first pass
followed by a neural second pass, offline ASR results can be even significantly
improved.Comment: 8 pages, 2 figures, accepted for publication at TSD 202
Sentence-Level Content Planning and Style Specification for Neural Text Generation
Building effective text generation systems requires three critical
components: content selection, text planning, and surface realization, and
traditionally they are tackled as separate problems. Recent all-in-one style
neural generation models have made impressive progress, yet they often produce
outputs that are incoherent and unfaithful to the input. To address these
issues, we present an end-to-end trained two-step generation model, where a
sentence-level content planner first decides on the keyphrases to cover as well
as a desired language style, followed by a surface realization decoder that
generates relevant and coherent text. For experiments, we consider three tasks
from domains with diverse topics and varying language styles: persuasive
argument construction from Reddit, paragraph generation for normal and simple
versions of Wikipedia, and abstract generation for scientific articles.
Automatic evaluation shows that our system can significantly outperform
competitive comparisons. Human judges further rate our system generated text as
more fluent and correct, compared to the generations by its variants that do
not consider language style.Comment: Accepted as a long paper to EMNLP 201
Un dictionnaire de régimes verbaux en mandarin
Ce mémoire s’insère dans le projet GenDR, un réalisateur de texte profond multilingue qui modélise l’interface sémantique-syntaxe pour la génération automatique de texte (GAT). Dans le cadre de la GAT, les ressources lexicales sont de première nécessité pour que le système puisse transformer des données nonlinguistiques en langage naturel. Ces ressources lexicales déterminent dans une certaine mesure la précision et la flexibilité des phrases générées. En raison de l’imprévisibilité du régime des verbes et du rôle central que les verbes jouent dans un énoncé, une ressource lexicale qui décrit le régime des verbes revêt une importance particulière pour générer du texte le plus précis et le plus naturel possible.
Nous avons tenté de créer un dictionnaire de régimes verbaux en mandarin. Ce genre de ressource lexicale est toujours une lacune dans le domaine de la GAT en mandarin. En nous basant sur la base de données Mandarin VerbNet, nous avons eu recours à Python pour extraire les adpositions régies et créer notre dictionnaire. Il s’agit d’un dictionnaire dynamique, dont le contenu peut être paramétré en fonction des objectifs de l’utilisateur.This work fits into the GenDR project, a multilingual deep realizer which models the semantics-syntax interface for natural language generation (NLG). In NLG, lexical resources are essential to transform non-linguistic data into natural language. To a certain extent, the lexical resources used determine the accuracy and flexibility of the sentences generated by a realizer. Due to the unpredictability of verbs’ syntactic behaviour and the central role that verbs play in an utterance, a lexical resource which describes the government patterns of verbs is key to generating the most precise and natural text possible.
We aim to create a dictionary of verbs’ government patterns in Mandarin. This kind of lexical resource is still missing for NLG in Mandarin. Based on the Mandarin VerbNet database, we used Python to extract information about adpositions and to create our dictionary. This is a dynamic dictionary whose content can be parameterized according to the user’s needs
Microplanning with Communicative Intentions: The SPUD System
The process of microplanning in Natural Language Generation (NLG) encompasses a range of problems in which a generator must bridge underlying domain-specific representations and general linguistic representations. These problems include constructing linguistic referring expressions to identify domain objects, selecting lexical items to express domain concepts, and using complex linguistic constructions to concisely convey related domain facts. In this paper, we argue that such problems are best solved through a uniform, comprehensive, declarative process. In our approach, the generator directly explores a search space for utterances described by a linguistic grammar. At each stage of search, the generator uses a model of interpretation, which characterizes the potential links between the utterance and the domain and context, to assess its progress in conveying domain-specific representations. We further address the challenges for implementation and knowledge representation in this approach. We show how to implement this approach effectively by using the lexicalized tree-adjoining grammar formalism (LTAG) to connect structure to meaning and using modal logic programming to connect meaning to context. We articulate a detailed methodology for designing grammatical and conceptua
Language generation module for conversational systems
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (p. 127-132).by Lauren M. Baptist.Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000