9,890 research outputs found
A study of turn-yelding cues in human-computer dialogue
Previous research has made signi cant advances in under- standing how humans manage to engage in smooth, well-coordinated conversation, and have unveiled the existence of several turn-yielding cues | lexico-syntactic, prosodic and acoustic events that may serve as predictors of conversational turn nality. These results have subse- quently aided the re nement of turn-taking pro ciency of spoken dia- logue systems. In this study, we nd empirical evidence in a corpus of human-computer dialogues that human users produce the same kinds of turn-yielding cues that have been observed in human-human interac- tions. We also show that a linear relation holds between the number of individual cues conjointly displayed and the likelihood of a turn switch.Sociedad Argentina de Informática e Investigación Operativa (SADIO
A study of turn-yelding cues in human-computer dialogue
Previous research has made signi cant advances in under- standing how humans manage to engage in smooth, well-coordinated conversation, and have unveiled the existence of several turn-yielding cues | lexico-syntactic, prosodic and acoustic events that may serve as predictors of conversational turn nality. These results have subse- quently aided the re nement of turn-taking pro ciency of spoken dia- logue systems. In this study, we nd empirical evidence in a corpus of human-computer dialogues that human users produce the same kinds of turn-yielding cues that have been observed in human-human interac- tions. We also show that a linear relation holds between the number of individual cues conjointly displayed and the likelihood of a turn switch.Sociedad Argentina de Informática e Investigación Operativa (SADIO
Towards responsive Sensitive Artificial Listeners
This paper describes work in the recently started project SEMAINE, which aims to build a set of Sensitive Artificial Listeners – conversational agents designed to sustain an interaction with a human user despite limited verbal skills, through robust recognition and generation of non-verbal behaviour in real-time, both when the agent is speaking and listening. We report on data collection and on the design of a system architecture in view of real-time responsiveness
Recommended from our members
Turn-Taking and Affirmative Cue Words in Task-Oriented Dialogue
As interactive voice response systems spread at a rapid pace, providing an increasingly more complex functionality, it is becoming clear that the challenges of such systems are not solely associated to their synthesis and recognition capabilities. Rather, issues such as the coordination of turn exchanges between system and user, or the correct generation and understanding of words that may convey multiple meanings, appear to play an important role in system usability. This thesis explores those two issues in the Columbia Games Corpus, a collection of spontaneous task-oriented dialogues in Standard American English. We provide evidence of the existence of seven turn-yielding cues -- prosodic, acoustic and syntactic events strongly associated with conversational turn endings -- and show that the likelihood of a turn-taking attempt from the interlocutor increases linearly with the number of cues conjointly displayed by the speaker. We present similar results related to six backchannel-inviting cues -- events that invite the interlocutor to produce a short utterance conveying continued attention. Additionally, we describe a series of studies of affirmative cue words -- a family of cue words such as 'okay' or 'alright' that speakers use frequently in conversation for several purposes: for acknowledging what the interlocutor has said, or for cueing the start of a new topic, among others. We find differences in the acoustic/prosodic realization of such functions, but observe that contextual information figures prominently in human disambiguation of these words. We also conduct machine learning experiments to explore the automatic classification of affirmative cue words. Finally, we examine a novel measure of speaker entrainment related to the usage of these words, showing its association with task success and dialogue coordination
Recommended from our members
Turn-Taking and Affirmative Cue Words in Task-Oriented Dialogue
As interactive voice response systems spread at a rapid pace, providing an increasingly more complex functionality, it is becoming clear that the challenges of such systems are not solely associated to their synthesis and recognition capabilities. Rather, issues such as the coordination of turn exchanges between system and user, or the correct generation and understanding of words that may convey multiple meanings, appear to play an important role in system usability. This thesis explores those two issues in the Columbia Games Corpus, a collection of spontaneous task-oriented dialogues in Standard American English. We provide evidence of the existence of seven turn-yielding cues -- prosodic, acoustic and syntactic events strongly associated with conversational turn endings -- and show that the likelihood of a turn-taking attempt from the interlocutor increases linearly with the number of cues conjointly displayed by the speaker. We present similar results related to six backchannel-inviting cues -- events that invite the interlocutor to produce a short utterance conveying continued attention. Additionally, we describe a series of studies of affirmative cue words -- a family of cue words such as 'okay' or 'alright' that speakers use frequently in conversation for several purposes: for acknowledging what the interlocutor has said, or for cueing the start of a new topic, among others. We find differences in the acoustic/prosodic realization of such functions, but observe that contextual information figures prominently in human disambiguation of these words. We also conduct machine learning experiments to explore the automatic classification of affirmative cue words. Finally, we examine a novel measure of speaker entrainment related to the usage of these words, showing its association with task success and dialogue coordination
TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog
Syntactic and pragmatic completeness is known to be important for turn-taking
prediction, but so far machine learning models of turn-taking have used such
linguistic information in a limited way. In this paper, we introduce TurnGPT, a
transformer-based language model for predicting turn-shifts in spoken dialog.
The model has been trained and evaluated on a variety of written and spoken
dialog datasets. We show that the model outperforms two baselines used in prior
work. We also report on an ablation study, as well as attention and gradient
analyses, which show that the model is able to utilize the dialog context and
pragmatic completeness for turn-taking prediction. Finally, we explore the
model's potential in not only detecting, but also projecting, turn-completions.Comment: Accepted to Findings of ACL: EMNLP 202
A cross-linguistic analysis of the temporal dynamics of turn-taking cues using machine learning as a descriptive tool
In dialogue, speakers produce and perceive acoustic/prosodic turn-taking cues, which are fundamental for negotiating turn exchanges with their interlocutors. However, little of the temporal dynamics and cross-linguistic validity of these cues is known. In this work, we explore a set of acoustic/prosodic cues preceding three turn-transition types (hold, switch and backchannel) in three different languages (Slovak, American English and Argentine Spanish). For this, we use and refine a set of machine learning techniques that enable a finer-grained temporal analysis of such cues, as well as a comparison of their relative explanatory power. Our results suggest that the three languages, despite belonging to distinct linguistic families, share the general usage of a handful of acoustic/prosodic features to signal turn transitions. We conclude that exploiting features such as speech rate, final-word lengthening, the pitch track over the final 200 ms, the intensity track over the final 1000 ms, and noise-to-harmonics ratio (a voice-quality feature) might prove useful for further improving the accuracy of the turn-taking modules found in modern spoken dialogue systems.Fil: Brusco, Pablo. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Vidal, Jazmín. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; ArgentinaFil: Beňuš, Štefan. University in Nitra; Eslovaquia. Slovak Academy of Sciences; EslovaquiaFil: Gravano, Agustin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Ciudad Universitaria. Instituto de Investigación en Ciencias de la Computación. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Investigación en Ciencias de la Computación; Argentina. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentin
- …