12 research outputs found
Pretraining Methods for Dialog Context Representation Learning
This paper examines various unsupervised pretraining objectives for learning
dialog context representations. Two novel methods of pretraining dialog context
encoders are proposed, and a total of four methods are examined. Each
pretraining objective is fine-tuned and evaluated on a set of downstream dialog
tasks using the MultiWoz dataset and strong performance improvement is
observed. Further evaluation shows that our pretraining objectives result in
not only better performance, but also better convergence, models that are less
data hungry and have better domain generalizability.Comment: Accepted to ACL 201
Robust Layout-aware IE for Visually Rich Documents with Pre-trained Language Models
Many business documents processed in modern NLP and IR pipelines are visually
rich: in addition to text, their semantics can also be captured by visual
traits such as layout, format, and fonts. We study the problem of information
extraction from visually rich documents (VRDs) and present a model that
combines the power of large pre-trained language models and graph neural
networks to efficiently encode both textual and visual information in business
documents. We further introduce new fine-tuning objectives to improve in-domain
unsupervised fine-tuning to better utilize large amount of unlabeled in-domain
data. We experiment on real world invoice and resume data sets and show that
the proposed method outperforms strong text-based RoBERTa baselines by 6.3%
absolute F1 on invoices and 4.7% absolute F1 on resumes. When evaluated in a
few-shot setting, our method requires up to 30x less annotation data than the
baseline to achieve the same level of performance at ~90% F1.Comment: 10 pages, to appear in SIGIR 2020 Industry Trac
USR: An Unsupervised and Reference Free Evaluation Metric for Dialog Generation
The lack of meaningful automatic evaluation metrics for dialog has impeded
open-domain dialog research. Standard language generation metrics have been
shown to be ineffective for evaluating dialog models. To this end, this paper
presents USR, an UnSupervised and Reference-free evaluation metric for dialog.
USR is a reference-free metric that trains unsupervised models to measure
several desirable qualities of dialog. USR is shown to strongly correlate with
human judgment on both Topical-Chat (turn-level: 0.42, system-level: 1.0) and
PersonaChat (turn-level: 0.48 and system-level: 1.0). USR additionally produces
interpretable measures for several desirable properties of dialog.Comment: Accepted to ACL 2020 as long pape
ConveRT: Efficient and Accurate Conversational Representations from Transformers
General-purpose pretrained sentence encoders such as BERT are not ideal for
real-world conversational AI applications; they are computationally heavy,
slow, and expensive to train. We propose ConveRT (Conversational
Representations from Transformers), a pretraining framework for conversational
tasks satisfying all the following requirements: it is effective, affordable,
and quick to train. We pretrain using a retrieval-based response selection
task, effectively leveraging quantization and subword-level parameterization in
the dual encoder to build a lightweight memory- and energy-efficient model. We
show that ConveRT achieves state-of-the-art performance across widely
established response selection tasks. We also demonstrate that the use of
extended dialog history as context yields further performance gains. Finally,
we show that pretrained representations from the proposed encoder can be
transferred to the intent classification task, yielding strong results across
three diverse data sets. ConveRT trains substantially faster than standard
sentence encoders or previous state-of-the-art dual encoders. With its reduced
size and superior performance, we believe this model promises wider portability
and scalability for Conversational AI applications
Multi-Granularity Representations of Dialog
Neural models of dialog rely on generalized latent representations of
language. This paper introduces a novel training procedure which explicitly
learns multiple representations of language at several levels of granularity.
The multi-granularity training algorithm modifies the mechanism by which
negative candidate responses are sampled in order to control the granularity of
learned latent representations. Strong performance gains are observed on the
next utterance retrieval task using both the MultiWOZ dataset and the Ubuntu
dialog corpus. Analysis significantly demonstrates that multiple granularities
of representation are being learned, and that multi-granularity training
facilitates better transfer to downstream tasks.Comment: Accepted as a long paper at EMNLP 201
Efficient Intent Detection with Dual Sentence Encoders
Building conversational systems in new domains and with added functionality
requires resource-efficient models that work under low-data regimes (i.e., in
few-shot setups). Motivated by these requirements, we introduce intent
detection methods backed by pretrained dual sentence encoders such as USE and
ConveRT. We demonstrate the usefulness and wide applicability of the proposed
intent detectors, showing that: 1) they outperform intent detectors based on
fine-tuning the full BERT-Large model or using BERT as a fixed black-box
encoder on three diverse intent detection data sets; 2) the gains are
especially pronounced in few-shot setups (i.e., with only 10 or 30 annotated
examples per intent); 3) our intent detectors can be trained in a matter of
minutes on a single CPU; and 4) they are stable across different hyperparameter
settings. In hope of facilitating and democratizing research focused on
intention detection, we release our code, as well as a new challenging
single-domain intent detection dataset comprising 13,083 annotated examples
over 77 intents
DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue
A long-standing goal of task-oriented dialogue research is the ability to
flexibly adapt dialogue models to new domains. To progress research in this
direction, we introduce DialoGLUE (Dialogue Language Understanding Evaluation),
a public benchmark consisting of 7 task-oriented dialogue datasets covering 4
distinct natural language understanding tasks, designed to encourage dialogue
research in representation-based transfer, domain adaptation, and
sample-efficient task learning. We release several strong baseline models,
demonstrating performance improvements over a vanilla BERT architecture and
state-of-the-art results on 5 out of 7 tasks, by pre-training on a large
open-domain dialogue corpus and task-adaptive self-supervised training. Through
the DialoGLUE benchmark, the baseline methods, and our evaluation scripts, we
hope to facilitate progress towards the goal of developing more general
task-oriented dialogue models.Comment: Benchmark hosted on:
https://evalai.cloudcv.org/web/challenges/challenge-page/708
Learning a Simple and Effective Model for Multi-turn Response Generation with Auxiliary Tasks
We study multi-turn response generation for open-domain dialogues. The
existing state-of-the-art addresses the problem with deep neural architectures.
While these models improved response quality, their complexity also hinders the
application of the models in real systems. In this work, we pursue a model that
has a simple structure yet can effectively leverage conversation contexts for
response generation. To this end, we propose four auxiliary tasks including
word order recovery, utterance order recovery, masked word recovery, and masked
utterance recovery, and optimize the objectives of these tasks together with
maximizing the likelihood of generation. By this means, the auxiliary tasks
that relate to context understanding can guide the learning of the generation
model to achieve a better local optimum. Empirical studies with three
benchmarks indicate that our model can significantly outperform
state-of-the-art generation models in terms of response quality on both
automatic evaluation and human judgment, and at the same time enjoys a much
faster decoding process
Is this Dialogue Coherent? Learning from Dialogue Acts and Entities
In this work, we investigate the human perception of coherence in open-domain
dialogues. In particular, we address the problem of annotating and modeling the
coherence of next-turn candidates while considering the entire history of the
dialogue. First, we create the Switchboard Coherence (SWBD-Coh) corpus, a
dataset of human-human spoken dialogues annotated with turn coherence ratings,
where next-turn candidate utterances ratings are provided considering the full
dialogue context. Our statistical analysis of the corpus indicates how turn
coherence perception is affected by patterns of distribution of entities
previously introduced and the Dialogue Acts used. Second, we experiment with
different architectures to model entities, Dialogue Acts and their combination
and evaluate their performance in predicting human coherence ratings on
SWBD-Coh. We find that models combining both DA and entity information yield
the best performances both for response selection and turn coherence rating.Comment: Accepted at SIGDIAL 202
Analyzing the Forgetting Problem in the Pretrain-Finetuning of Dialogue Response Models
In this work, we study how the finetuning stage in the pretrain-finetune
framework changes the behavior of a pretrained neural language generator. We
focus on the transformer encoder-decoder model for the open-domain dialogue
response generation task. Our major finding is that after standard finetuning,
the model forgets some of the important language generation skills acquired
during large-scale pretraining. We demonstrate the forgetting phenomenon
through a set of detailed behavior analysis from the perspectives of knowledge
transfer, context sensitivity, and function space projection. As a preliminary
attempt to alleviate the forgetting problem, we propose an intuitive finetuning
strategy named "mix-review". We find that mix-review effectively regularizes
the finetuning process, and the forgetting problem is alleviated to some
extent. Finally, we discuss interesting behavior of the resulting dialogue
model and its implications