9 research outputs found
Learning Features that Predict Cue Usage
Our goal is to identify the features that predict the occurrence and
placement of discourse cues in tutorial explanations in order to aid in the
automatic generation of explanations. Previous attempts to devise rules for
text generation were based on intuition or small numbers of constructed
examples. We apply a machine learning program, C4.5, to induce decision trees
for cue occurrence and placement from a corpus of data coded for a variety of
features previously thought to affect cue usage. Our experiments enable us to
identify the features with most predictive power, and show that machine
learning can be used to induce decision trees useful for text generation.Comment: 10 pages, 2 Postscript figures, uses aclap.sty, psfig.te
A corpus analysis of discourse relations for Natural Language Generation
We are developing a Natural Language Generation (NLG) system that generates texts tailored for the reading ability of individual readers. As part of building the system, GIRL (Generator for Individual Reading Levels), we carried out an analysis of the RST Discourse Treebank Corpus to find out how human writers linguistically realise discourse relations. The goal of the analysis was (a) to create a model of the choices that need to be made when realising discourse relations, and (b) to understand how these choices were typically made for ānormalā readers, for a variety of discourse relations. We present our results for discourse relations: concession, condition, elaboration additional, evaluation, example, reason and restatement. We discuss the results and how they were used in GIRL
Discourse Structure and Anaphora: An Empirical Study
ne of the main motivations for studying discourse structure is its effect on the search for the antecedents of anaphoric expressions. We tested the predictions in this regard of theories assuming that the structure of a discourse depends on its intentional structure, such as Grosz and Sidner?s theory. We used a corpus of tutorial dialogues independently annotated according to Relational Discourse Analysis (RDA), a theory of discourse structure merging ideas from Grosz and Sidner?s theory with proposals from Rhetorical Structure Theory (RST). Using as our metrics the accessibility of anaphoric antecedents and the reduction in ambiguity brought about by a particular theory, we found support for Moser and Moore?s proposal that among the units of discourse assumed by an RST-like theory, only those expressing an intentional ?core? (in the RDA sense) should be viewed as constraining the search for antecedents; units only expressing informational relations should not introduce separate focus spaces. We also found that the best compromise between accessibility and ambiguity (?perplexity?) reduction is a model in which the focus spaces associated with embedded cores and embedded contributors remain on the stack until the RDA-segment in which they occur is completed, and discuss the implications of this finding for a stack-based theory
Individual and Domain Adaptation in Sentence Planning for Dialogue
One of the biggest challenges in the development and deployment of spoken
dialogue systems is the design of the spoken language generation module. This
challenge arises from the need for the generator to adapt to many features of
the dialogue domain, user population, and dialogue context. A promising
approach is trainable generation, which uses general-purpose linguistic
knowledge that is automatically adapted to the features of interest, such as
the application domain, individual user, or user group. In this paper we
present and evaluate a trainable sentence planner for providing restaurant
information in the MATCH dialogue system. We show that trainable sentence
planning can produce complex information presentations whose quality is
comparable to the output of a template-based generator tuned to this domain. We
also show that our method easily supports adapting the sentence planner to
individuals, and that the individualized sentence planners generally perform
better than models trained and tested on a population of individuals. Previous
work has documented and utilized individual preferences for content selection,
but to our knowledge, these results provide the first demonstration of
individual preferences for sentence planning operations, affecting the content
order, discourse structure and sentence structure of system responses. Finally,
we evaluate the contribution of different feature sets, and show that, in our
application, n-gram features often do as well as features based on higher-level
linguistic representations
Research on Phraseology Across Continents
The second volume of the IDP series contains papers by phraseologists from five continents: Europe, Australia, North America, South America and Asia, which were written within the framework of the project Intercontinental Dialogue on Phraseology, prepared and coordinated by Joanna Szerszunowicz, conducted by the University of Bialystok in cooperation with Kwansei Gakuin University in Japan. The book consists of the following parts: Dialogue on Phraseology, General and Corpus Linguistics & Phraseology, Lexicography & Phraseology, Contrastive Linguistics, Translation & Phraseology, Literature, Cultural Studies, Education & Phraseology. Dialogue contains two papers written by widely recognised phraseologists: professor Anita Naciscione from Latvia and professor Irine Goshkheteliani.The volume has been financed by the Philological Department of the University of Bialysto
Learning Features that Predict Cue Usage
Our goal is to identify the features that predict the occurrence and placement of discourse cues in tutorial explanations in order to aid in the automatic generation of explanations. Previous attempts to devise rules for text generation were based on intuition or small numbers of constructed examples. We apply a machine learning program, C4.5, to induce decision trees for cue occurrence and placement from a corpus of data coded for a variety of features previously thought to affect cue usage. Our experiments enable us to identify the features with most predictive power, and show that machine learning can be used to induce decision trees useful for text generation.