917 research outputs found
User Intent Prediction in Information-seeking Conversations
Conversational assistants are being progressively adopted by the general
population. However, they are not capable of handling complicated
information-seeking tasks that involve multiple turns of information exchange.
Due to the limited communication bandwidth in conversational search, it is
important for conversational assistants to accurately detect and predict user
intent in information-seeking conversations. In this paper, we investigate two
aspects of user intent prediction in an information-seeking setting. First, we
extract features based on the content, structural, and sentiment
characteristics of a given utterance, and use classic machine learning methods
to perform user intent prediction. We then conduct an in-depth feature
importance analysis to identify key features in this prediction task. We find
that structural features contribute most to the prediction performance. Given
this finding, we construct neural classifiers to incorporate context
information and achieve better performance without feature engineering. Our
findings can provide insights into the important factors and effective methods
of user intent prediction in information-seeking conversations.Comment: Accepted to CHIIR 201
Parallel Attention: A Unified Framework for Visual Object Discovery through Dialogs and Queries
Recognising objects according to a pre-defined fixed set of class labels has
been well studied in the Computer Vision. There are a great many practical
applications where the subjects that may be of interest are not known
beforehand, or so easily delineated, however. In many of these cases natural
language dialog is a natural way to specify the subject of interest, and the
task achieving this capability (a.k.a, Referring Expression Comprehension) has
recently attracted attention. To this end we propose a unified framework, the
ParalleL AttentioN (PLAN) network, to discover the object in an image that is
being referred to in variable length natural expression descriptions, from
short phrases query to long multi-round dialogs. The PLAN network has two
attention mechanisms that relate parts of the expressions to both the global
visual content and also directly to object candidates. Furthermore, the
attention mechanisms are recurrent, making the referring process visualizable
and explainable. The attended information from these dual sources are combined
to reason about the referred object. These two attention mechanisms can be
trained in parallel and we find the combined system outperforms the
state-of-art on several benchmarked datasets with different length language
input, such as RefCOCO, RefCOCO+ and GuessWhat?!.Comment: 11 page
Automatic recognition of the general-purpose communicative functions defined by the ISO 24617-2 standard for dialog act annotation
From the perspective of a dialog system, it is important to identify the intention behind the segments in a dialog, since it provides an important cue regarding the information that is present in the segments and how they should be interpreted. ISO 24617-2, the standard for dialog act annotation, defines a hierarchically organized set of general-purpose communicative functions which correspond to different intentions that are relevant in the context of a dialog. We explore the automatic recognition of these communicative functions in the DialogBank, which is a reference set of dialogs annotated according to this standard. To do so, we propose adaptations of existing approaches to flat dialog act recognition that allow them to deal with the hierarchical classification problem. More specifically, we propose the use of an end-to-end hierarchical network with cascading outputs and maximum a posteriori path estimation to predict the communicative function at each level of the hierarchy, preserve the dependencies between the functions in the path, and decide at which level to stop. Furthermore, since the amount of dialogs in the DialogBank is small, we rely on transfer learning processes to reduce overfitting and improve performance. The results of our experiments show that our approach outperforms both a flat one and hierarchical approaches based on multiple classifiers and that each of its components plays an important role towards the recognition of general-purpose communicative functions.info:eu-repo/semantics/publishedVersio
Incremental LSTM-based Dialog State Tracker
A dialog state tracker is an important component in modern spoken dialog
systems. We present an incremental dialog state tracker, based on LSTM
networks. It directly uses automatic speech recognition hypotheses to track the
state. We also present the key non-standard aspects of the model that bring its
performance close to the state-of-the-art and experimentally analyze their
contribution: including the ASR confidence scores, abstracting scarcely
represented values, including transcriptions in the training data, and model
averaging
- …