5,716 research outputs found
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118
pages, 8 figures, 1 tabl
Three Approaches to Generating Texts in Different Styles
Natural Language Generation (nlg) systems generate texts in English and other human languages from non-linguistic input data. Usually there are a large number of possible texts that can communicate the input data, and nlg systems must choose one of these. We argue that style can be used by nlg systems to choose between possible texts, and explore how this can be done by (1) explicit stylistic parameters, (2) imitating a genre style, and (3) imitating an individual’s style
Crowd-sourcing NLG Data: Pictures Elicit Better Data
Recent advances in corpus-based Natural Language Generation (NLG) hold the
promise of being easily portable across domains, but require costly training
data, consisting of meaning representations (MRs) paired with Natural Language
(NL) utterances. In this work, we propose a novel framework for crowdsourcing
high quality NLG training data, using automatic quality control measures and
evaluating different MRs with which to elicit data. We show that pictorial MRs
result in better NL data being collected than logic-based MRs: utterances
elicited by pictorial MRs are judged as significantly more natural, more
informative, and better phrased, with a significant increase in average quality
ratings (around 0.5 points on a 6-point scale), compared to using the logical
MRs. As the MR becomes more complex, the benefits of pictorial stimuli
increase. The collected data will be released as part of this submission.Comment: The 9th International Natural Language Generation conference INLG,
2016. 10 pages, 2 figures, 3 table
Investigating Linguistic Pattern Ordering in Hierarchical Natural Language Generation
Natural language generation (NLG) is a critical component in spoken dialogue
system, which can be divided into two phases: (1) sentence planning: deciding
the overall sentence structure, (2) surface realization: determining specific
word forms and flattening the sentence structure into a string. With the rise
of deep learning, most modern NLG models are based on a sequence-to-sequence
(seq2seq) model, which basically contains an encoder-decoder structure; these
NLG models generate sentences from scratch by jointly optimizing sentence
planning and surface realization. However, such simple encoder-decoder
architecture usually fail to generate complex and long sentences, because the
decoder has difficulty learning all grammar and diction knowledge well. This
paper introduces an NLG model with a hierarchical attentional decoder, where
the hierarchy focuses on leveraging linguistic knowledge in a specific order.
The experiments show that the proposed method significantly outperforms the
traditional seq2seq model with a smaller model size, and the design of the
hierarchical attentional decoder can be applied to various NLG systems.
Furthermore, different generation strategies based on linguistic patterns are
investigated and analyzed in order to guide future NLG research work.Comment: accepted by the 7th IEEE Workshop on Spoken Language Technology (SLT
2018). arXiv admin note: text overlap with arXiv:1808.0274
Data-driven Natural Language Generation: Paving the Road to Success
We argue that there are currently two major bottlenecks to the commercial use
of statistical machine learning approaches for natural language generation
(NLG): (a) The lack of reliable automatic evaluation metrics for NLG, and (b)
The scarcity of high quality in-domain corpora. We address the first problem by
thoroughly analysing current evaluation metrics and motivating the need for a
new, more reliable metric. The second problem is addressed by presenting a
novel framework for developing and evaluating a high quality corpus for NLG
training.Comment: WiNLP workshop at ACL 201
- …