47 research outputs found
User Intent Prediction in Information-seeking Conversations
Conversational assistants are being progressively adopted by the general
population. However, they are not capable of handling complicated
information-seeking tasks that involve multiple turns of information exchange.
Due to the limited communication bandwidth in conversational search, it is
important for conversational assistants to accurately detect and predict user
intent in information-seeking conversations. In this paper, we investigate two
aspects of user intent prediction in an information-seeking setting. First, we
extract features based on the content, structural, and sentiment
characteristics of a given utterance, and use classic machine learning methods
to perform user intent prediction. We then conduct an in-depth feature
importance analysis to identify key features in this prediction task. We find
that structural features contribute most to the prediction performance. Given
this finding, we construct neural classifiers to incorporate context
information and achieve better performance without feature engineering. Our
findings can provide insights into the important factors and effective methods
of user intent prediction in information-seeking conversations.Comment: Accepted to CHIIR 201
Incorporating Loose-Structured Knowledge into Conversation Modeling via Recall-Gate LSTM
Modeling human conversations is the essence for building satisfying chat-bots
with multi-turn dialog ability. Conversation modeling will notably benefit from
domain knowledge since the relationships between sentences can be clarified due
to semantic hints introduced by knowledge. In this paper, a deep neural network
is proposed to incorporate background knowledge for conversation modeling.
Through a specially designed Recall gate, domain knowledge can be transformed
into the extra global memory of Long Short-Term Memory (LSTM), so as to enhance
LSTM by cooperating with its local memory to capture the implicit semantic
relevance between sentences within conversations. In addition, this paper
introduces the loose structured domain knowledge base, which can be built with
slight amount of manual work and easily adopted by the Recall gate. Our model
is evaluated on the context-oriented response selecting task, and experimental
results on both two datasets have shown that our approach is promising for
modeling human conversations and building key components of automatic chatting
systems.Comment: under review of IJCNN 2017; 10 pages, 5 figure
Sentence Pair Scoring: Towards Unified Framework for Text Comprehension
We review the task of Sentence Pair Scoring, popular in the literature in
various forms - viewed as Answer Sentence Selection, Semantic Text Scoring,
Next Utterance Ranking, Recognizing Textual Entailment, Paraphrasing or e.g. a
component of Memory Networks.
We argue that all such tasks are similar from the model perspective and
propose new baselines by comparing the performance of common IR metrics and
popular convolutional, recurrent and attention-based neural models across many
Sentence Pair Scoring tasks and datasets. We discuss the problem of evaluating
randomized models, propose a statistically grounded methodology, and attempt to
improve comparisons by releasing new datasets that are much harder than some of
the currently used well explored benchmarks. We introduce a unified open source
software framework with easily pluggable models and tasks, which enables us to
experiment with multi-task reusability of trained sentence model. We set a new
state-of-art in performance on the Ubuntu Dialogue dataset.Comment: submitted as paper to CoNLL 201
Do Response Selection Models Really Know What's Next? Utterance Manipulation Strategies for Multi-turn Response Selection
In this paper, we study the task of selecting the optimal response given a
user and system utterance history in retrieval-based multi-turn dialog systems.
Recently, pre-trained language models (e.g., BERT, RoBERTa, and ELECTRA) showed
significant improvements in various natural language processing tasks. This and
similar response selection tasks can also be solved using such language models
by formulating the tasks as dialog--response binary classification tasks.
Although existing works using this approach successfully obtained
state-of-the-art results, we observe that language models trained in this
manner tend to make predictions based on the relatedness of history and
candidates, ignoring the sequential nature of multi-turn dialog systems. This
suggests that the response selection task alone is insufficient for learning
temporal dependencies between utterances. To this end, we propose utterance
manipulation strategies (UMS) to address this problem. Specifically, UMS
consist of several strategies (i.e., insertion, deletion, and search), which
aid the response selection model towards maintaining dialog coherence. Further,
UMS are self-supervised methods that do not require additional annotation and
thus can be easily incorporated into existing approaches. Extensive evaluation
across multiple languages and models shows that UMS are highly effective in
teaching dialog consistency, which leads to models pushing the state-of-the-art
with significant margins on multiple public benchmark datasets.Comment: Accepted to AAAI 202
Diversifying Topic-Coherent Response Generation for Natural Multi-turn Conversations
Although response generation (RG) diversification for single-turn dialogs has
been well developed, it is less investigated for natural multi-turn
conversations. Besides, past work focused on diversifying responses without
considering topic coherence to the context, producing uninformative replies. In
this paper, we propose the Topic-coherent Hierarchical Recurrent
Encoder-Decoder model (THRED) to diversify the generated responses without
deviating the contextual topics for multi-turn conversations. In overall, we
build a sequence-to-sequence net (Seq2Seq) to model multi-turn conversations.
And then we resort to the latent Variable Hierarchical Recurrent
Encoder-Decoder model (VHRED) to learn global contextual distribution of
dialogs. Besides, we construct a dense topic matrix which implies word-level
correlations of the conversation corpora. The topic matrix is used to learn
local topic distribution of the contextual utterances. By incorporating both
the global contextual distribution and the local topic distribution, THRED
produces both diversified and topic-coherent replies. In addition, we propose
an explicit metric (\emph{TopicDiv}) to measure the topic divergence between
the post and generated response, and we also propose an overall metric
combining the diversification metric (\emph{Distinct}) and \emph{TopicDiv}. We
evaluate our model comparing with three baselines (Seq2Seq, HRED and VHRED) on
two real-world corpora, respectively, and demonstrate its outstanding
performance in both diversification and topic coherence
Modeling Multi-turn Conversation with Deep Utterance Aggregation
Multi-turn conversation understanding is a major challenge for building
intelligent dialogue systems. This work focuses on retrieval-based response
matching for multi-turn conversation whose related work simply concatenates the
conversation utterances, ignoring the interactions among previous utterances
for context modeling. In this paper, we formulate previous utterances into
context using a proposed deep utterance aggregation model to form a
fine-grained context representation. In detail, a self-matching attention is
first introduced to route the vital information in each utterance. Then the
model matches a response with each refined utterance and the final matching
score is obtained after attentive turns aggregation. Experimental results show
our model outperforms the state-of-the-art methods on three multi-turn
conversation benchmarks, including a newly introduced e-commerce dialogue
corpus.Comment: Proceedings of the 27th International Conference on Computational
Linguistics (COLING 2018
Improving Response Selection in Multi-Turn Dialogue Systems by Incorporating Domain Knowledge
Building systems that can communicate with humans is a core problem in
Artificial Intelligence. This work proposes a novel neural network architecture
for response selection in an end-to-end multi-turn conversational dialogue
setting. The architecture applies context level attention and incorporates
additional external knowledge provided by descriptions of domain-specific
words. It uses a bi-directional Gated Recurrent Unit (GRU) for encoding context
and responses and learns to attend over the context words given the latent
response representation and vice versa.In addition, it incorporates external
domain specific information using another GRU for encoding the domain keyword
descriptions. This allows better representation of domain-specific keywords in
responses and hence improves the overall performance. Experimental results show
that our model outperforms all other state-of-the-art methods for response
selection in multi-turn conversations.Comment: Published as conference paper at CoNLL 201
Strategy of the Negative Sampling for Training Retrieval-Based Dialogue Systems
The article describes the new approach for quality improvement of automated
dialogue systems for customer support service. Analysis produced in the paper
demonstrates the dependency of the quality of the retrieval-based dialogue
system quality on the choice of negative responses. The proposed approach
implies choosing the negative samples according to the distribution of
responses in the train set. In this implementation the negative samples are
randomly chosen from the original response distribution and from the
"artificial" distribution of negative responses, such as uniform distribution
or the distribution obtained by transformation of the original one. The results
obtained for the implemented systems and reported in this paper confirm the
significant improvement of automated dialogue systems quality in case of using
the negative responses from transformed distribution
Sequential Attention-based Network for Noetic End-to-End Response Selection
The noetic end-to-end response selection challenge as one track in Dialog
System Technology Challenges 7 (DSTC7) aims to push the state of the art of
utterance classification for real world goal-oriented dialog systems, for which
participants need to select the correct next utterances from a set of
candidates for the multi-turn context. This paper describes our systems that
are ranked the top on both datasets under this challenge, one focused and small
(Advising) and the other more diverse and large (Ubuntu). Previous
state-of-the-art models use hierarchy-based (utterance-level and token-level)
neural networks to explicitly model the interactions among different turns'
utterances for context modeling. In this paper, we investigate a sequential
matching model based only on chain sequence for multi-turn response selection.
Our results demonstrate that the potentials of sequential matching approaches
have not yet been fully exploited in the past for multi-turn response
selection. In addition to ranking the top in the challenge, the proposed model
outperforms all previous models, including state-of-the-art hierarchy-based
models, and achieves new state-of-the-art performances on two large-scale
public multi-turn response selection benchmark datasets.Comment: Ranked first in DSTC7 Track 1. Accepted for an oral presentation at
the DSTC7 workshop at AAAI 2019. The source code is available no
Mix-and-Match: Scalable Dialog Response Retrieval using Gaussian Mixture Embeddings
Embedding-based approaches for dialog response retrieval embed the
context-response pairs as points in the embedding space. These approaches are
scalable, but fail to account for the complex, many-to-many relationships that
exist between context-response pairs. On the other end of the spectrum, there
are approaches that feed the context-response pairs jointly through multiple
layers of neural networks. These approaches can model the complex relationships
between context-response pairs, but fail to scale when the set of responses is
moderately large (>100). In this paper, we combine the best of both worlds by
proposing a scalable model that can learn complex relationships between
context-response pairs. Specifically, the model maps the contexts as well as
responses to probability distributions over the embedding space. We train the
models by optimizing the Kullback-Leibler divergence between the distributions
induced by context-response pairs in the training data. We show that the
resultant model achieves better performance as compared to other
embedding-based approaches on publicly available conversation data.Comment: 10 pages, 2 figure