12 research outputs found
Yeah, Right, Uh-Huh: A Deep Learning Backchannel Predictor
Using supporting backchannel (BC) cues can make human-computer interaction
more social. BCs provide a feedback from the listener to the speaker indicating
to the speaker that he is still listened to. BCs can be expressed in different
ways, depending on the modality of the interaction, for example as gestures or
acoustic cues. In this work, we only considered acoustic cues. We are proposing
an approach towards detecting BC opportunities based on acoustic input features
like power and pitch. While other works in the field rely on the use of a
hand-written rule set or specialized features, we made use of artificial neural
networks. They are capable of deriving higher order features from input
features themselves. In our setup, we first used a fully connected feed-forward
network to establish an updated baseline in comparison to our previously
proposed setup. We also extended this setup by the use of Long Short-Term
Memory (LSTM) networks which have shown to outperform feed-forward based setups
on various tasks. Our best system achieved an F1-Score of 0.37 using power and
pitch features. Adding linguistic information using word2vec, the score
increased to 0.39
Tagging Of Speech Acts And Dialogue Games In Spanish Call Home
The Clarity project is devoted to automatic detection and classification of discourse structures in casual, non-task-oriented conversation using shallow, corpus-based methods of analysis. For the Clarity project, we have tagged speech acts and dialogue games in the Call Home Spanish corpus. We have done preliminary cross-level experiments on the relationship of word and speech act n-grams to dialogue games. Our results show that the label of a game cannot be predicted from n-grams of words it contains. We get better than baseline results for predicting the label of a game from the sequence of speech acts it contains, but only when the speech acts are hand tagged, and not when they are automatically detected. Our future research will focus on finding linguistic cues that are more predictive of game labels. The automatic classification of speech acts and games is carried out in a multi-level architecture that integrates classification at multiple discourse levels instead of performing them sequentially
Speaker-change Aware CRF for Dialogue Act Classification
Recent work in Dialogue Act (DA) classification approaches the task as a
sequence labeling problem, using neural network models coupled with a
Conditional Random Field (CRF) as the last layer. CRF models the conditional
probability of the target DA label sequence given the input utterance sequence.
However, the task involves another important input sequence, that of speakers,
which is ignored by previous work. To address this limitation, this paper
proposes a simple modification of the CRF layer that takes speaker-change into
account. Experiments on the SwDA corpus show that our modified CRF layer
outperforms the original one, with very wide margins for some DA labels.
Further, visualizations demonstrate that our CRF layer can learn meaningful,
sophisticated transition patterns between DA label pairs conditioned on
speaker-change in an end-to-end way. Code is publicly available
Dialogue Act Recognition Approaches
This paper deals with automatic dialogue act (DA) recognition. Dialogue acts are sentence-level units that represent states of a dialogue, such as questions, statements, hesitations, etc. The knowledge of dialogue act realizations in a discourse or dialogue is part of the speech understanding and dialogue analysis process. It is of great importance for many applications: dialogue systems, speech recognition, automatic machine translation, etc. The main goal of this paper is to study the existing works about DA recognition and to discuss their respective advantages and drawbacks. A major concern in the DA recognition domain is that, although a few DA annotation schemes seem now to emerge as standards, most of the time, these DA tag-sets have to be adapted to the specificities of a given application, which prevents the deployment of standardized DA databases and evaluation procedures. The focus of this review is put on the various kinds of information that can be used to recognize DAs, such as prosody, lexical, etc., and on the types of models proposed so far to capture this information. Combining these information sources tends to appear nowadays as a prerequisite to recognize DAs
Hierarchical Multi-Label Dialog Act Recognition on Spanish Data
Dialog acts reveal the intention behind the uttered words. Thus, their
automatic recognition is important for a dialog system trying to understand its
conversational partner. The study presented in this article approaches that
task on the DIHANA corpus, whose three-level dialog act annotation scheme poses
problems which have not been explored in recent studies. In addition to the
hierarchical problem, the two lower levels pose multi-label classification
problems. Furthermore, each level in the hierarchy refers to a different aspect
concerning the intention of the speaker both in terms of the structure of the
dialog and the task. Also, since its dialogs are in Spanish, it allows us to
assess whether the state-of-the-art approaches on English data generalize to a
different language. More specifically, we compare the performance of different
segment representation approaches focusing on both sequences and patterns of
words and assess the importance of the dialog history and the relations between
the multiple levels of the hierarchy. Concerning the single-label
classification problem posed by the top level, we show that the conclusions
drawn on English data also hold on Spanish data. Furthermore, we show that the
approaches can be adapted to multi-label scenarios. Finally, by hierarchically
combining the best classifiers for each level, we achieve the best results
reported for this corpus.Comment: 21 pages, 4 figures, 17 tables, translated version of the article
published in Linguam\'atica 11(1
Automatic annotation of context and speech acts for dialogue corpora
Richly annotated dialogue corpora are essential for new research directions in statistical learning approaches to dialogue management, context-sensitive interpretation, and context-sensitive speech recognition. In particular, large dialogue corpora annotated with contextual information and speech acts are urgently required. We explore how existing dialogue corpora (usually consisting of utterance transcriptions) can be automatically processed to yield new corpora where dialogue context and speech acts are accurately represented. We present a conceptual and computational framework for generating such corpora. As an example, we present and evaluate an automatic annotation system which builds ‘Information State Update' (ISU) representations of dialogue context for the Communicator (2000 and 2001) corpora of human-machine dialogues (2,331 dialogues). The purposes of this annotation are to generate corpora for reinforcement learning of dialogue policies, for building user simulations, for evaluating different dialogue strategies against a baseline, and for training models for context-dependent interpretation and speech recognition. The automatic annotation system parses system and user utterances into speech acts and builds up sequences of dialogue context representations using an ISU dialogue manager. We present the architecture of the automatic annotation system and a detailed example to illustrate how the system components interact to produce the annotations. We also evaluate the annotations, with respect to the task completion metrics of the original corpus and in comparison to hand-annotated data and annotations produced by a baseline automatic system. The automatic annotations perform well and largely outperform the baseline automatic annotations in all measures. The resulting annotated corpus has been used to train high-quality user simulations and to learn successful dialogue strategies. The final corpus will be made publicly availabl
Automatic Annotation of Context and Speech Acts for Dialogue Corpora.
Richly annotated dialogue corpora are essential for new research directions in statistical learning approaches to dialogue management, context-sensitive interpretation, and context-sensitive speech recognition. In particular, large dialogue corpora annotated with contextual information and speech acts are urgently required. We explore how existing dialogue corpora (usually consisting of utterance transcriptions) can be automatically processed to yield new corpora where dialogue context and speech acts are accurately represented. We present a conceptual and computational framework for generating such corpora. As an example, we present and evaluate an automatic annotation system which builds ‘Information State Update’ (ISU) representations of dialogue context for the Communicator (2000 and 2001) corpora of human–machine dialogues (2,331 dialogues). The purposes of this annotation are to generate corpora for reinforcement learning of dialogue policies, for building user simulations, for evaluating different dialogue strategies against a baseline, and for training models for context-dependent interpretation and speech recognition. The automatic annotation system parses system and user utterances into speech acts and builds up sequences of dialogue context representations using an ISU dialogue manager. We present the architecture of the automatic annotation system and a detailed example to illustrate how the system components interact to produce the annotations. We also evaluate the annotations, with respect to the task completion metrics of the original corpus and in comparison to hand-annotated data and annotations produced by a baseline automatic system. The automatic annotations perform well and largely outperform the baseline automatic annotations in all measures. The resulting annotated corpus has been used to train high-quality user simulations and to learn successful dialogue strategies. The final corpus will be made publicly available
Is Every Tweet Created Equal? A Framework to Identify Relevant Tweets for Business Research
It is a life or death matter for a firm to observe its environment and identify new threats or opportunities quickly. Information technology has increased firm’s speed and agility in responding to environmental changes. Social media offers a vast and timely source of environmental information that firms can readily use gauge public sentiment. Twitter is a high-speed service that allows anyone to “tweet” a message to any interested parties. Firms can access near instantaneous changes in the public mood about any topic by using Sentiment Analysis. These topics range from predicting equities prices to predicting election outcomes. A gap exists in the literature because researchers discard tweets without any theoretically sound reason for doing so. We propose a framework that provides a theory-based justification for discarding data. We then explore the framework results using high frequency equity market prices. By examining the results of three case studies encompassing 57,600 OLS regressions and 1,887,408 tweets, our results indicate the framework yields higher quality results as measured by better R2 fits