Search CORE

856 research outputs found

A Reference Architecture for Natural Language Generation Systems

Author: Cahill Lynn
Evans Roger
Mellish Chris
Paiva Daniel
Reape Mike
Scott Donia
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2006
Field of study

We present the RAGS (Reference Architecture for Generation Systems) framework: a specification of an abstract Natural Language Generation (NLG) system architecture to support sharing, re-use, comparison and evaluation of NLG technologies. We argue that the evidence from a survey of actual NLG systems calls for a different emphasis in a reference proposal from that seen in similar initiatives in information extraction and multimedia interfaces. We introduce the framework itself, in particular the two-level data model that allows us to support the complex data requirements of NLG systems in a flexible and coherent fashion, and describe our efforts to validate the framework through a range of implementations

CiteSeerX

Crossref

University of Brighton Research Portal

Open Research Online (The Open University)

Comprehension Driven Document Planning in Natural Language Generation Systems

Author: Reiter Ehud
Sripada Somayajulu
Thomson Craig
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

This work is funded by the Engineering and Physical Sciences Research Council (EPSRC), under a National Productivity Investment Fund Doctoral Studentship (EP/R512412/1).Publisher PD

Aberdeen University Research

Crossref

The E2E Dataset: New Challenges For End-to-End Generation

Author: Dušek Ondřej
Novikova Jekaterina
Rieser Verena
Publication venue
Publication date: 01/01/2017
Field of study

This paper describes the E2E data, a new dataset for training end-to-end, data-driven natural language generation systems in the restaurant domain, which is ten times bigger than existing, frequently used datasets in this area. The E2E dataset poses new challenges: (1) its human reference texts show more lexical richness and syntactic variation, including discourse phenomena; (2) generating from this set requires content selection. As such, learning from this dataset promises more natural, varied and less template-like system utterances. We also establish a baseline on this dataset, which illustrates some of the difficulties associated with this data.Comment: Accepted as a short paper for SIGDIAL 2017 (final submission including supplementary material

arXiv.org e-Print Archive

Crossref

Heriot Watt Pure

Visualising Discourse Coherence in Non-Linear Documents

Author: Buckingham Shum Simon
Mancini Clara
Scott Donia
Publication venue
Publication date: 01/01/2006
Field of study

To produce coherent linear documents, Natural Language Generation systems have traditionally exploited the structuring role of textual discourse markers such as relational and referential phrases. These coherence markers of the traditional notion of text, however, do not work in non-linear documents: a new set of graphical devices is needed together with formation rules to govern their usage, supported by sound theoretical frameworks. If in linear documents graphical devices such as layout and formatting complement textual devices in the expression of discourse coherence, in non-linear documents they play a more important role. In this paper, we present our theoretical and empirical work in progress, which explores new possibilities for expressing coherence in the generation of hypertext documents

Directory of Open Access Journals

Open Research Online (The Open University)

Mixing representation levels: The hybrid approach to automatic text generation

Author: Pianta Emanuele
Tovena Lucia M.
Publication venue
Publication date: 01/01/1999
Field of study

Natural language generation systems (NLG) map non-linguistic representations into strings of words through a number of steps using intermediate representations of various levels of abstraction. Template based systems, by contrast, tend to use only one representation level, i.e. fixed strings, which are combined, possibly in a sophisticated way, to generate the final text. In some circumstances, it may be profitable to combine NLG and template based techniques. The issue of combining generation techniques can be seen in more abstract terms as the issue of mixing levels of representation of different degrees of linguistic abstraction. This paper aims at defining a reference architecture for systems using mixed representations. We argue that mixed representations can be used without abandoning a linguistically grounded approach to language generation.Comment: 6 page

arXiv.org e-Print Archive

CiteSeerX

Archivio della ricerca - Fondazione Bruno Kessler

Dynamic Human Evaluation for Relative Model Comparisons

Author: Hollenstein Nora
Renggli Cedric
Thorleiksdóttir Thórhildur
Zhang Ce
Publication venue
Publication date: 01/01/2022
Field of study

Collecting human judgements is currently the most reliable evaluation method for natural language generation systems. Automatic metrics have reported flaws when applied to measure quality aspects of generated text and have been shown to correlate poorly with human judgements. However, human evaluation is time and cost-intensive, and we lack consensus on designing and conducting human evaluation experiments. Thus there is a need for streamlined approaches for efficient collection of human judgements when evaluating natural language generation systems. Therefore, we present a dynamic approach to measure the required number of human annotations when evaluating generated outputs in relative comparison settings. We propose an agent-based framework of human evaluation to assess multiple labelling strategies and methods to decide the better model in a simulation and a crowdsourcing case study. The main results indicate that a decision about the superior model can be made with high probability across different labelling strategies, where assigning a single random worker per task requires the least overall labelling effort and thus the least cost.Comment: accepted at LREC 202

arXiv.org e-Print Archive

Repository for Publications and Research Data

Underreporting of errors in NLG output, and what to do about it

Author: Clinciu Miruna
Dušek Ondřej
Gkatzia Dimitra
Inglis Stephanie
Leppänen Leo
Mahamood Saad
Manning Emma
Schoch Stephanie
Thomson Craig
van Miltenburg Emiel
Wen Luou
Publication venue: The Association for Computational Linguistics
Publication date: 01/08/2021
Field of study

We observe a severe under-reporting of the different kinds of errors that Natural Language Generation systems make. This is a problem, because mistakes are an important indicator of where systems should still be improved. If authors only report overall performance metrics, the research community is left in the dark about the specific weaknesses that are exhibited by `state-of-the-art' research. Next to quantifying the extent of error under-reporting, this position paper provides recommendations for error identification, analysis and reporting.Peer reviewe

Aberdeen University Research

Helsingin yliopiston digitaalinen arkisto

Tilburg University Repository