151,766 research outputs found
DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances
Recent advances in pre-trained language models have significantly improved
neural response generation. However, existing methods usually view the dialogue
context as a linear sequence of tokens and learn to generate the next word
through token-level self-attention. Such token-level encoding hinders the
exploration of discourse-level coherence among utterances. This paper presents
DialogBERT, a novel conversational response generation model that enhances
previous PLM-based dialogue models. DialogBERT employs a hierarchical
Transformer architecture. To efficiently capture the discourse-level coherence
among utterances, we propose two training objectives, including masked
utterance regression and distributed utterance order ranking in analogy to the
original BERT training. Experiments on three multi-turn conversation datasets
show that our approach remarkably outperforms the baselines, such as BART and
DialoGPT, in terms of quantitative evaluation. The human evaluation suggests
that DialogBERT generates more coherent, informative, and human-like responses
than the baselines with significant margins.Comment: Published as a conference paper at AAAI 202
Better Conversations by Modeling,Filtering,and Optimizing for Coherence and Diversity
We present three enhancements to existing encoder-decoder models for
open-domain conversational agents, aimed at effectively modeling coherence and
promoting output diversity: (1) We introduce a measure of coherence as the
GloVe embedding similarity between the dialogue context and the generated
response, (2) we filter our training corpora based on the measure of coherence
to obtain topically coherent and lexically diverse context-response pairs, (3)
we then train a response generator using a conditional variational autoencoder
model that incorporates the measure of coherence as a latent variable and uses
a context gate to guarantee topical consistency with the context and promote
lexical diversity. Experiments on the OpenSubtitles corpus show a substantial
improvement over competitive neural models in terms of BLEU score as well as
metrics of coherence and diversity
DeepStory: Video Story QA by Deep Embedded Memory Networks
Question-answering (QA) on video contents is a significant challenge for
achieving human-level intelligence as it involves both vision and language in
real-world settings. Here we demonstrate the possibility of an AI agent
performing video story QA by learning from a large amount of cartoon videos. We
develop a video-story learning model, i.e. Deep Embedded Memory Networks
(DEMN), to reconstruct stories from a joint scene-dialogue video stream using a
latent embedding space of observed data. The video stories are stored in a
long-term memory component. For a given question, an LSTM-based attention model
uses the long-term memory to recall the best question-story-answer triplet by
focusing on specific words containing key information. We trained the DEMN on a
novel QA dataset of children's cartoon video series, Pororo. The dataset
contains 16,066 scene-dialogue pairs of 20.5-hour videos, 27,328 fine-grained
sentences for scene description, and 8,913 story-related QA pairs. Our
experimental results show that the DEMN outperforms other QA models. This is
mainly due to 1) the reconstruction of video stories in a scene-dialogue
combined form that utilize the latent embedding and 2) attention. DEMN also
achieved state-of-the-art results on the MovieQA benchmark.Comment: 7 pages, accepted for IJCAI 201
A Hierarchical Neural Autoencoder for Paragraphs and Documents
Natural language generation of coherent long texts like paragraphs or longer
documents is a challenging problem for recurrent networks models. In this
paper, we explore an important step toward this generation task: training an
LSTM (Long-short term memory) auto-encoder to preserve and reconstruct
multi-sentence paragraphs. We introduce an LSTM model that hierarchically
builds an embedding for a paragraph from embeddings for sentences and words,
then decodes this embedding to reconstruct the original paragraph. We evaluate
the reconstructed paragraph using standard metrics like ROUGE and Entity Grid,
showing that neural models are able to encode texts in a way that preserve
syntactic, semantic, and discourse coherence. While only a first step toward
generating coherent text units from neural models, our work has the potential
to significantly impact natural language generation and
summarization\footnote{Code for the three models described in this paper can be
found at www.stanford.edu/~jiweil/
- …