Search CORE

5 research outputs found

The Natural Language Generation Pipeline, Neural Text Generation and Explainability

Author: Faille Juliette
Gardent Claire
Gatt Albert
Publication venue: HAL CCSD
Publication date: 15/12/2020
Field of study

International audienceEnd-to-end encoder-decoder approaches to data-to-text generation are often black boxes whose predictions are difficult to explain. Breaking up the end-to-end model into submodules is a natural way to address this problem. The traditional pre-neural Natural Language Generation (NLG) pipeline provides a framework for breaking up the end-to-end encoder-decoder. We survey recent papers that integrate traditional NLG sub-modules in neural approaches and analyse their explainability. Our survey is a first step towards building explainable neural NLG models

INRIA a CCSD electronic archive server

Scalable Micro-planned Generation of Discourse from Structured Data

Author: Abhijit Mishra
Ahn Sungjin
Anirban Laha
Banko Michele
Bao Junwei
Dale Robert
Fevry Thibault
Heafield Kenneth
Karthik Sankaranarayanan
Klein Guillaume
Konstas Ioannis
Liu Tianyu
Parag Jain
Schmitz Michael
Vinyals Oriol
Publication venue: 'MIT Press - Journals'
Publication date
Field of study

Crossref

Data-to-text generation with neural planning

Author: Puduppully Ratish Surendran
Publication venue: The University of Edinburgh
Publication date: 11/04/2022
Field of study

In this thesis, we consider the task of data-to-text generation, which takes non-linguistic structures as input and produces textual output. The inputs can take the form of database tables, spreadsheets, charts, and so on. The main application of data-to-text generation is to present information in a textual format which makes it accessible to a layperson who may otherwise find it problematic to understand numerical figures. The task can also automate routine document generation jobs, thus improving human efficiency. We focus on generating long-form text, i.e., documents with multiple paragraphs. Recent approaches to data-to-text generation have adopted the very successful encoder-decoder architecture or its variants. These models generate fluent (but often imprecise) text and perform quite poorly at selecting appropriate content and ordering it coherently. This thesis focuses on overcoming these issues by integrating content planning with neural models. We hypothesize data-to-text generation will benefit from explicit planning, which manifests itself in (a) micro planning, (b) latent entity planning, and (c) macro planning. Throughout this thesis, we assume the input to our generator are tables (with records) in the sports domain. And the output are summaries describing what happened in the game (e.g., who won/lost, ..., scored, etc.). We first describe our work on integrating fine-grained or micro plans with data-to-text generation. As part of this, we generate a micro plan highlighting which records should be mentioned and in which order, and then generate the document while taking the micro plan into account. We then show how data-to-text generation can benefit from higher level latent entity planning. Here, we make use of entity-specific representations which are dynam ically updated. The text is generated conditioned on entity representations and the records corresponding to the entities by using hierarchical attention at each time step. We then combine planning with the high level organization of entities, events, and their interactions. Such coarse-grained macro plans are learnt from data and given as input to the generator. Finally, we present work on making macro plans latent while incrementally generating a document paragraph by paragraph. We infer latent plans sequentially with a structured variational model while interleaving the steps of planning and generation. Text is generated by conditioning on previous variational decisions and previously generated text. Overall our results show that planning makes data-to-text generation more interpretable, improves the factuality and coherence of the generated documents and re duces redundancy in the output document

Edinburgh Research Archive