4 research outputs found
Copy mechanism and tailored training for character-based data-to-text generation
In the last few years, many different methods have been focusing on using
deep recurrent neural networks for natural language generation. The most widely
used sequence-to-sequence neural methods are word-based: as such, they need a
pre-processing step called delexicalization (conversely, relexicalization) to
deal with uncommon or unknown words. These forms of processing, however, give
rise to models that depend on the vocabulary used and are not completely
neural.
In this work, we present an end-to-end sequence-to-sequence model with
attention mechanism which reads and generates at a character level, no longer
requiring delexicalization, tokenization, nor even lowercasing. Moreover, since
characters constitute the common "building blocks" of every text, it also
allows a more general approach to text generation, enabling the possibility to
exploit transfer learning for training. These skills are obtained thanks to two
major features: (i) the possibility to alternate between the standard
generation mechanism and a copy one, which allows to directly copy input facts
to produce outputs, and (ii) the use of an original training pipeline that
further improves the quality of the generated texts.
We also introduce a new dataset called E2E+, designed to highlight the
copying capabilities of character-based models, that is a modified version of
the well-known E2E dataset used in the E2E Challenge. We tested our model
according to five broadly accepted metrics (including the widely used BLEU),
showing that it yields competitive performance with respect to both
character-based and word-based approaches.Comment: ECML-PKDD 2019 (Camera ready version
Controlling Hallucinations at Word Level in Data-to-Text Generation
Data-to-Text Generation (DTG) is a subfield of Natural Language Generation
aiming at transcribing structured data in natural language descriptions. The
field has been recently boosted by the use of neural-based generators which
exhibit on one side great syntactic skills without the need of hand-crafted
pipelines; on the other side, the quality of the generated text reflects the
quality of the training data, which in realistic settings only offer
imperfectly aligned structure-text pairs. Consequently, state-of-art neural
models include misleading statements - usually called hallucinations - in their
outputs. The control of this phenomenon is today a major challenge for DTG, and
is the problem addressed in the paper.
Previous work deal with this issue at the instance level: using an alignment
score for each table-reference pair. In contrast, we propose a finer-grained
approach, arguing that hallucinations should rather be treated at the word
level. Specifically, we propose a Multi-Branch Decoder which is able to
leverage word-level labels to learn the relevant parts of each training
instance. These labels are obtained following a simple and efficient scoring
procedure based on co-occurrence analysis and dependency parsing. Extensive
evaluations, via automated metrics and human judgment on the standard WikiBio
benchmark, show the accuracy of our alignment labels and the effectiveness of
the proposed Multi-Branch Decoder. Our model is able to reduce and control
hallucinations, while keeping fluency and coherence in generated texts. Further
experiments on a degraded version of ToTTo show that our model could be
successfully used on very noisy settings.Comment: 20 pages, 6 figures, 5 tables (excluding Appendix). Source code:
https://github.com/KaijuML/dtt-multi-branc
Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge
This paper provides a comprehensive analysis of the first shared task on
End-to-End Natural Language Generation (NLG) and identifies avenues for future
research based on the results. This shared task aimed to assess whether recent
end-to-end NLG systems can generate more complex output by learning from
datasets containing higher lexical richness, syntactic complexity and diverse
discourse phenomena. Introducing novel automatic and human metrics, we compare
62 systems submitted by 17 institutions, covering a wide range of approaches,
including machine learning architectures -- with the majority implementing
sequence-to-sequence models (seq2seq) -- as well as systems based on
grammatical rules and templates. Seq2seq-based systems have demonstrated a
great potential for NLG in the challenge. We find that seq2seq systems
generally score high in terms of word-overlap metrics and human evaluations of
naturalness -- with the winning SLUG system (Juraska et al., 2018) being
seq2seq-based. However, vanilla seq2seq models often fail to correctly express
a given meaning representation if they lack a strong semantic control mechanism
applied during decoding. Moreover, seq2seq models can be outperformed by
hand-engineered systems in terms of overall quality, as well as complexity,
length and diversity of outputs. This research has influenced, inspired and
motivated a number of recent studies outwith the original competition, which we
also summarise as part of this paper.Comment: Computer Speech and Language, final accepted manuscript (in press