19 research outputs found
Contextual Language Model Adaptation for Conversational Agents
Statistical language models (LM) play a key role in Automatic Speech
Recognition (ASR) systems used by conversational agents. These ASR systems
should provide a high accuracy under a variety of speaking styles, domains,
vocabulary and argots. In this paper, we present a DNN-based method to adapt
the LM to each user-agent interaction based on generalized contextual
information, by predicting an optimal, context-dependent set of LM
interpolation weights. We show that this framework for contextual adaptation
provides accuracy improvements under different possible mixture LM partitions
that are relevant for both (1) Goal-oriented conversational agents where it's
natural to partition the data by the requested application and for (2) Non-goal
oriented conversational agents where the data can be partitioned using topic
labels that come from predictions of a topic classifier. We obtain a relative
WER improvement of 3% with a 1-pass decoding strategy and 6% in a 2-pass
decoding framework, over an unadapted model. We also show up to a 15% relative
improvement in recognizing named entities which is of significant value for
conversational ASR systems.Comment: Interspeech 2018 (accepted
Multi-Sentence Knowledge Selection in Open-Domain Dialogue
Incorporating external knowledge sources effectively in conversations is a
longstanding problem in open-domain dialogue research. The existing literature
on open-domain knowledge selection is limited and makes certain brittle
assumptions on knowledge sources to simplify the overall task (Dinan et al.,
2019), such as the existence of a single relevant knowledge sentence per
context. In this work, we evaluate the existing state of open-domain
conversation knowledge selection, showing where the existing methodologies
regarding data and evaluation are flawed. We then improve on them by proposing
a new framework for collecting relevant knowledge, and create an augmented
dataset based on the Wizard of Wikipedia (WOW) corpus, which we call WOW++.
WOW++ averages 8 relevant knowledge sentences per dialogue context, embracing
the inherent ambiguity of open-domain dialogue knowledge selection. We then
benchmark various knowledge ranking algorithms on this augmented dataset with
both intrinsic evaluation and extrinsic measures of response quality, showing
that neural rerankers that use WOW++ can outperform rankers trained on standard
datasets.Comment: Accepted at INLG 2021. 11 pages, 5 tables, 8 figure
Think Before You Speak: Explicitly Generating Implicit Commonsense Knowledge for Response Generation
Implicit knowledge, such as common sense, is key to fluid human
conversations. Current neural response generation (RG) models are trained to
generate responses directly, omitting unstated implicit knowledge. In this
paper, we present Think-Before-Speaking (TBS), a generative approach to first
externalize implicit commonsense knowledge (think) and use this knowledge to
generate responses (speak). We expect that externalizing implicit knowledge
allows more efficient learning, produces more informative responses, and
enables more explainable models. We analyze different choices to collect
knowledge-aligned dialogues, represent implicit knowledge, and transition
between knowledge and dialogues. Empirical results show TBS models outperform
end-to-end and knowledge-augmented RG baselines on most automatic metrics and
generate more informative, specific, and commonsense-following responses, as
evaluated by human annotators. TBS also generates knowledge that makes sense
and is relevant to the dialogue around 85\% of the time.Comment: Accepted at ACL 2022 main conference. 16 pages, 9 figures, 9 table