17 research outputs found
Dialogue Act Recognition via CRF-Attentive Structured Network
Dialogue Act Recognition (DAR) is a challenging problem in dialogue
interpretation, which aims to attach semantic labels to utterances and
characterize the speaker's intention. Currently, many existing approaches
formulate the DAR problem ranging from multi-classification to structured
prediction, which suffer from handcrafted feature extensions and attentive
contextual structural dependencies. In this paper, we consider the problem of
DAR from the viewpoint of extending richer Conditional Random Field (CRF)
structural dependencies without abandoning end-to-end training. We incorporate
hierarchical semantic inference with memory mechanism on the utterance
modeling. We then extend structured attention network to the linear-chain
conditional random field layer which takes into account both contextual
utterances and corresponding dialogue acts. The extensive experiments on two
major benchmark datasets Switchboard Dialogue Act (SWDA) and Meeting Recorder
Dialogue Act (MRDA) datasets show that our method achieves better performance
than other state-of-the-art solutions to the problem. It is a remarkable fact
that our method is nearly close to the human annotator's performance on SWDA
within 2% gap.Comment: 10 pages, 4figure
Short Utterance Dialogue Act Classification Using a Transformer Ensemble
An influx of digital assistant adoption and reliance is demonstrating the significance of reliable and robust dialogue act classification techniques. In the literature, there is an over-representation of purely lexical-based dialogue act classification methods. A weakness of this approach is the lack of context when classifying short utterances. We improve upon a purely lexical approach by incorporating a state-of-the-art acoustic model in a lexical-acoustic transformer ensemble, with improved results, when classifying dialogue acts in the MRDA corpus. Additionally, we further investigate the performance on an utterance word-count basis, showing classification accuracy increases with utterance word count. Furthermore, the performance of the lexical model increases with utterance word length and the acoustic model performance decreases with utterance word count, showing the models complement each other for different utterance lengths
Modeling Long-Range Context for Concurrent Dialogue Acts Recognition
In dialogues, an utterance is a chain of consecutive sentences produced by
one speaker which ranges from a short sentence to a thousand-word post. When
studying dialogues at the utterance level, it is not uncommon that an utterance
would serve multiple functions. For instance, "Thank you. It works great."
expresses both gratitude and positive feedback in the same utterance. Multiple
dialogue acts (DA) for one utterance breeds complex dependencies across
dialogue turns. Therefore, DA recognition challenges a model's predictive power
over long utterances and complex DA context. We term this problem Concurrent
Dialogue Acts (CDA) recognition. Previous work on DA recognition either assumes
one DA per utterance or fails to realize the sequential nature of dialogues. In
this paper, we present an adapted Convolutional Recurrent Neural Network (CRNN)
which models the interactions between utterances of long-range context. Our
model significantly outperforms existing work on CDA recognition on a tech
forum dataset.Comment: Accepted to CIKM '1
DCR-Net: A Deep Co-Interactive Relation Network for Joint Dialog Act Recognition and Sentiment Classification
In dialog system, dialog act recognition and sentiment classification are two
correlative tasks to capture speakers intentions, where dialog act and
sentiment can indicate the explicit and the implicit intentions separately.
Most of the existing systems either treat them as separate tasks or just
jointly model the two tasks by sharing parameters in an implicit way without
explicitly modeling mutual interaction and relation. To address this problem,
we propose a Deep Co-Interactive Relation Network (DCR-Net) to explicitly
consider the cross-impact and model the interaction between the two tasks by
introducing a co-interactive relation layer. In addition, the proposed relation
layer can be stacked to gradually capture mutual knowledge with multiple steps
of interaction. Especially, we thoroughly study different relation layers and
their effects. Experimental results on two public datasets (Mastodon and
Dailydialog) show that our model outperforms the state-of-the-art joint model
by 4.3% and 3.4% in terms of F1 score on dialog act recognition task, 5.7% and
12.4% on sentiment classification respectively. Comprehensive analysis
empirically verifies the effectiveness of explicitly modeling the relation
between the two tasks and the multi-steps interaction mechanism. Finally, we
employ the Bidirectional Encoder Representation from Transformer (BERT) in our
framework, which can further boost our performance in both tasks.Comment: Accepted by AAAI2020 (Oral
Guiding attention in Sequence-to-sequence models for Dialogue Act prediction
The task of predicting dialog acts (DA) based on conversational dialog is a
key component in the development of conversational agents. Accurately
predicting DAs requires a precise modeling of both the conversation and the
global tag dependencies. We leverage seq2seq approaches widely adopted in
Neural Machine Translation (NMT) to improve the modelling of tag sequentiality.
Seq2seq models are known to learn complex global dependencies while currently
proposed approaches using linear conditional random fields (CRF) only model
local tag dependencies. In this work, we introduce a seq2seq model tailored for
DA classification using: a hierarchical encoder, a novel guided attention
mechanism and beam search applied to both training and inference. Compared to
the state of the art our model does not require handcrafted features and is
trained end-to-end. Furthermore, the proposed approach achieves an unmatched
accuracy score of 85% on SwDA, and state-of-the-art accuracy score of 91.6% on
MRDA
Filling Conversation Ellipsis for Better Social Dialog Understanding
The phenomenon of ellipsis is prevalent in social conversations. Ellipsis
increases the difficulty of a series of downstream language understanding
tasks, such as dialog act prediction and semantic role labeling. We propose to
resolve ellipsis through automatic sentence completion to improve language
understanding. However, automatic ellipsis completion can result in output
which does not accurately reflect user intent. To address this issue, we
propose a method which considers both the original utterance that has ellipsis
and the automatically completed utterance in dialog act and semantic role
labeling tasks. Specifically, we first complete user utterances to resolve
ellipsis using an end-to-end pointer network model. We then train a prediction
model using both utterances containing ellipsis and our automatically completed
utterances. Finally, we combine the prediction results from these two
utterances using a selection model that is guided by expert knowledge. Our
approach improves dialog act prediction and semantic role labeling by 1.3% and
2.5% in F1 score respectively in social conversations. We also present an
open-domain human-machine conversation dataset with manually completed user
utterances and annotated semantic role labeling after manual completion.Comment: Accepted to AAAI 202
Co-GAT: A Co-Interactive Graph Attention Network for Joint Dialog Act Recognition and Sentiment Classification
In a dialog system, dialog act recognition and sentiment classification are
two correlative tasks to capture speakers intentions, where dialog act and
sentiment can indicate the explicit and the implicit intentions separately. The
dialog context information (contextual information) and the mutual interaction
information are two key factors that contribute to the two related tasks.
Unfortunately, none of the existing approaches consider the two important
sources of information simultaneously. In this paper, we propose a
Co-Interactive Graph Attention Network (Co-GAT) to jointly perform the two
tasks. The core module is a proposed co-interactive graph interaction layer
where a cross-utterances connection and a cross-tasks connection are
constructed and iteratively updated with each other, achieving to consider the
two types of information simultaneously. Experimental results on two public
datasets show that our model successfully captures the two sources of
information and achieve the state-of-the-art performance.
In addition, we find that the contributions from the contextual and mutual
interaction information do not fully overlap with contextualized word
representations (BERT, Roberta, XLNet).Comment: Accepted by AAAI2021 (Long Paper). arXiv admin note: text overlap
with arXiv:2008.0691