28 research outputs found
Recommended from our members
Turn-Taking and Affirmative Cue Words in Task-Oriented Dialogue
As interactive voice response systems spread at a rapid pace, providing an increasingly more complex functionality, it is becoming clear that the challenges of such systems are not solely associated to their synthesis and recognition capabilities. Rather, issues such as the coordination of turn exchanges between system and user, or the correct generation and understanding of words that may convey multiple meanings, appear to play an important role in system usability. This thesis explores those two issues in the Columbia Games Corpus, a collection of spontaneous task-oriented dialogues in Standard American English. We provide evidence of the existence of seven turn-yielding cues -- prosodic, acoustic and syntactic events strongly associated with conversational turn endings -- and show that the likelihood of a turn-taking attempt from the interlocutor increases linearly with the number of cues conjointly displayed by the speaker. We present similar results related to six backchannel-inviting cues -- events that invite the interlocutor to produce a short utterance conveying continued attention. Additionally, we describe a series of studies of affirmative cue words -- a family of cue words such as 'okay' or 'alright' that speakers use frequently in conversation for several purposes: for acknowledging what the interlocutor has said, or for cueing the start of a new topic, among others. We find differences in the acoustic/prosodic realization of such functions, but observe that contextual information figures prominently in human disambiguation of these words. We also conduct machine learning experiments to explore the automatic classification of affirmative cue words. Finally, we examine a novel measure of speaker entrainment related to the usage of these words, showing its association with task success and dialogue coordination
Recommended from our members
Rapid Language Model Development Using External Resources for New Spoken Dialog Domains
This paper addresses a critical problem in deploying a spoken dialog system (SDS). One of the main bottlenecks of SDS deployment for a new domain is data sparseness in building a statistical language model. Our goal is to devise a method to efficiently build a reliable language model for a new SDS. We consider the worst yet quite common scenario where only a small amount (∼1.7K utterances) of domain specific data is available for the target domain. We present a new method that exploits external static text resources that are collected for other speech recognition tasks as well as dynamic text resources acquired from World Wide Web (WWW). We show that language models built using external resources can jointly be used with limited in–domain (baseline) language model to obtain significant improvements in speech recognition accuracy. Combining language models built using external resources with the in–domain language model provides over 20 % reduction in WER over the baseline in–domain language model. Equivalently, we achieve almost the same level of performance by having ten times as much in–domain data (17K utterances)
Restoring Punctuation and Capitalization in Transcribed Speech
Adding punctuation and capitalization greatly improves the readability of automatic speech transcripts. We discuss an approach for performing both tasks in a single pass using a purely text-basedn-gram language model. We study the effect on performance of varying the n-gram order (from n = 3 to n = 6) and the amount of training data (from 58 million to
55 billion tokens). Our results show that using larger training data sets consistently improves performance, while increasing the n-gram order does not help nearly as much
High Frequency Word Entertainment in Spoken Dialogue
Cognitive theories of dialogue hold that entrainment, the automatic alignment between dialogue partners at many levels of linguistic representation, is key to facilitating both production and comprehension in dialogue. In this paper we examine novel types of entrainment in two corpora—Switchboard and the Columbia Games corpus. We examine entrainment in use of high-frequency words (the most common words in the corpus), and its association with dialogue naturalness and flow, as well as with task success. Our results show that such entrainment is predictive of the perceived naturalness of dialogues and is significantly correlated with task success; in overall interaction flow, higher degrees of entrainment are associated with more overlaps and fewer interruptions
Recommended from our members
The Prosody of Backchannels in American English
We examine prosodic and contextual factors characterizing the backchannel function of single affirmative words. Data is drawn from
collaborative task-oriented dialogues between speakers of Standard American English. Despite high lexical variability, backchannels are
prosodically well defined: they have higher pitch and intensity and greater pitch slope than affirmative words expressing other pragmatic
functions. Additionally, we identify phrase-final rising pitch as a salient trigger for backchanneling
On the Role of Context and Prosody in the Interpretation of ‘Okay’
We examine the effect of contextual and acoustic cues in the disambiguation of three discourse-pragmatic functions of the word okay. Results of a perception study show that contextual cues are stronger predictors of discourse function than acoustic cues. However, acoustic features capturing the pitch excursion at the right edge of okay feature prominently in disambiguation, whether other contextual cues are present or not
Associação entre fatores individuais e contextuais e o desempenho cognitivo em pré-escolares com necessidades básicas insatisfeitas
En el marco de un proyecto de intervención, orientado a optimizar el desempeño cognitivo a través de actividades de juego para madres y sus hijos, este estudio presenta los resultados de un análisis de asociación entre factores (a) individuales (i.e. cortisol; actividad electroencefalográfica; lenguaje; y salud), y (b) contextuales (i.e. características del hogar; salud materna; y lenguaje materno), con la eficiencia en la solución de tareas con demandas cognitivas, en una muestra de 46 niños de 5 años de edad, sin historia del trastorno del desarrollo, y pertenecientes a hogares con NBI. Luego de aplicar análisis no paramétricos de tendencias entre grupos, los resultados indicaron a los siguientes como los factores de mayor asociación con el desempeño cognitivo: (a) conectividad y potencia neurales; y (b) lenguaje materno. El abordaje implementado contribuye con una mejora en la comprensión de las asociaciones entre factores individuales y contextuales del desempeño cognitivo, al considerar diferentes niveles de organización involucrados en su desarrollo.In the context of an experimental intervention aimed at optimizing cognitive development through play activities for mothers and their children, this study presents the results of an association analysis between (a) individual (i.e. cortisol, electroencephalographic activity, language, and health conditions), and (b) contextual factors (i.e. home characteristics, maternal health, and mother language) with the efficiency in task solution with cognitive demands, in a sample of 46 5-years-old children, with no history of developmental disorder, and from UBN homes. After applying non-parametric trend analyses between groups, the results indicated the following as the factors of greatest association with cognitive performance: (a) neural connectivity and power; and (b) mother language. The implemented approach contributes to the understanding of the associations between individual and contextual factors of cognitive performance, considering different levels of organization involved in its development.No contexto de uma intervenção experimental objetivando otimizar o desenvolvimento cognitivo através de atividades lúdicas para mães e seus filhos, este estudo apresenta os resultados de uma análise de associação entre (a) atividade individual (cortisol, atividade eletroencefalográfica, linguagem e condições de saúde) e (b) fatores contextuais (características domiciliares, saúde materna e língua materna), com a eficiência em solução de tarefas com demandas cognitivas, em uma amostra de 46 crianças de 5 anos de idade, sem história de transtorno de desenvolvimento e de domicílios com necessidades básicas insatisfeitas. Após a aplicação de análises de tendências não-paramétricas entre os grupos, os resultados indicaram os seguintes fatores de maior associação com o desempenho cognitivo: (a) conectividade neural e poder; E (b) a língua materna. A abordagem implementada contribui para a compreensão das associações entre fatores individuais e contextuais de desempenho cognitivo, considerando diferentes níveis de organização envolvidos em seu desenvolvimento.Fil: Prats, Lucía María. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. CEMIC-CONICET. Centro de Educaciones Médicas e Investigaciones Clínicas ; ArgentinaFil: Segretin, María Soledad. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. CEMIC-CONICET. Centro de Educaciones Médicas e Investigaciones Clínicas ; ArgentinaFil: Fracchia, Carolina Soledad. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. CEMIC-CONICET. Centro de Educaciones Médicas e Investigaciones Clínicas ; ArgentinaFil: Kamienkowski, Juan Esteban. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación. Laboratorio de Inteligencia Artificial Aplicada; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Pietto, Marcos Luis. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación. Laboratorio de Inteligencia Artificial Aplicada; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. CEMIC-CONICET. Centro de Educaciones Médicas e Investigaciones Clínicas ; ArgentinaFil: Hermida, Maria Julia. Universidad Torcuato di Tella; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. CEMIC-CONICET. Centro de Educaciones Médicas e Investigaciones Clínicas ; ArgentinaFil: Giovannetti, Federico. Centro de Educación Médica e Investigaciones Clínicas “Norberto Quirno”; ArgentinaFil: Mancini, Natalia. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. CEMIC-CONICET. Centro de Educaciones Médicas e Investigaciones Clínicas ; ArgentinaFil: Gravano, Agustin. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación. Laboratorio de Inteligencia Artificial Aplicada; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Sheese, Brad. Illinois Wesleyan University; Estados UnidosFil: Lipina, Sebastián Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. CEMIC-CONICET. Centro de Educaciones Médicas e Investigaciones Clínicas ; Argentin
Turn-taking cues in task-oriented dialogue
As interactive voice response systems become more prevalent and provide increasingly more complex functionality, it becomes clear that the challenges facing such systems are not solely in their synthesis and recognition capabilities. Issues such as the coordination of turn exchanges between system and user also play an important role in system usability. In particular, both systems and users have difficulty determining when the other is taking or relinquishing the turn. In this paper, we seek to identify turn-taking cues correlated with human-human turn exchanges which are automatically computable. We compare the presence of potential prosodic, acoustic, and lexico-syntactic turn-yielding cues in prosodic phrases preceding turn changes (smooth switches) vs. turn retentions (holds) vs. backchannels in the Columbia Games Corpus, a large corpus of task-oriented dialogues, to determine which features reliably distinguish between these three. We identify seven turn-yielding cues, all of which can be extracted automatically, for future use in turn generation and recognition in interactive voice response (IVR) systems. Testing Duncan's (1972) hypothesis that these turn-yielding cues are linearly correlated with the occurrence of turn-taking attempts, we further demonstrate that, the greater the number of turn-yielding cues that are present, the greater the likelihood that a turn change will occur. We also identify six cues that precede backchannels, which will also be useful for IVR backchannel generation and recognition; these cues correlate with backchannel occurrence in a quadratic manner. We find similar results for overlapping and for non-overlapping speech.Fil: Gravano, Agustin. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Departamento de Computación; Argentina. Universidad de Buenos Aires. Facultad de Medicina. Hospital de Clínicas General San Martín; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Hirschberg, Julia. Columbia University; Estados Unido