515 research outputs found

    A cross-linguistic study on turn-taking and temporal alignment in verbal interaction

    Get PDF
    Kousidis S, Schlangen D, Skopeteas S. A cross-linguistic study on turn-taking and temporal alignment in verbal interaction. In: Proceedings of Interspeech 2013. 2013

    A Research on the Use of Pause and Lengthening for Turn Organization in Chinese EFL Students’ Conversations

    Get PDF
    Pause and lengthening are used frequently for turn organization in English interactions. But, for Chinese EFL learners, these two prosodic mechanisms are not used efficiently. This study analyzed the use of pause and lengthening for turn organization in Chinese EFL learners’ English conversations. The results show the excessive dependence on the pause to show the turn yielding intentions in Chinese learners’ conversations, and Chinese learners probably cannot distinguish the uses of final lengthening within turns and the lengthening before turn changes

    Dimensions of communication

    Get PDF

    A Study of Accomodation of Prosodic and Temporal Features in Spoken Dialogues in View of Speech Technology Applications

    Get PDF
    Inter-speaker accommodation is a well-known property of human speech and human interaction in general. Broadly it refers to the behavioural patterns of two (or more) interactants and the effect of the (verbal and non-verbal) behaviour of each to that of the other(s). Implementation of thisbehavior in spoken dialogue systems is desirable as an improvement on the naturalness of humanmachine interaction. However, traditional qualitative descriptions of accommodation phenomena do not provide sufficient information for such an implementation. Therefore, a quantitativedescription of inter-speaker accommodation is required. This thesis proposes a methodology of monitoring accommodation during a human or humancomputer dialogue, which utilizes a moving average filter over sequential frames for each speaker. These frames are time-aligned across the speakers, hence the name Time Aligned Moving Average (TAMA). Analysis of spontaneous human dialogue recordings by means of the TAMA methodology reveals ubiquitous accommodation of prosodic features (pitch, intensity and speech rate) across interlocutors, and allows for statistical (time series) modeling of the behaviour, in a way which is meaningful for implementation in spoken dialogue system (SDS) environments.In addition, a novel dialogue representation is proposed that provides an additional point of view to that of TAMA in monitoring accommodation of temporal features (inter-speaker pause length and overlap frequency). This representation is a percentage turn distribution of individual speakercontributions in a dialogue frame which circumvents strict attribution of speaker-turns, by considering both interlocutors as synchronously active. Both TAMA and turn distribution metrics indicate that correlation of average pause length and overlap frequency between speakers can be attributed to accommodation (a debated issue), and point to possible improvements in SDS “turntaking” behaviour. Although the findings of the prosodic and temporal analyses can directly inform SDS implementations, further work is required in order to describe inter-speaker accommodation sufficiently, as well as to develop an adequate testing platform for evaluating the magnitude ofperceived improvement in human-machine interaction. Therefore, this thesis constitutes a first step towards a convincingly useful implementation of accommodation in spoken dialogue systems

    Beyond ‘Interaction’: How to Understand Social Effects on Social Cognition

    Get PDF
    In recent years, a number of philosophers and cognitive scientists have advocated for an ‘interactive turn’ in the methodology of social-cognition research: to become more ecologically valid, we must design experiments that are interactive, rather than merely observational. While the practical aim of improving ecological validity in the study of social cognition is laudable, we think that the notion of ‘interaction’ is not suitable for this task: as it is currently deployed in the social cognition literature, this notion leads to serious conceptual and methodological confusion. In this paper, we tackle this confusion on three fronts: 1) we revise the ‘interactionist’ definition of interaction; 2) we demonstrate a number of potential methodological confounds that arise in interactive experimental designs; and 3) we show that ersatz interactivity works just as well as the real thing. We conclude that the notion of ‘interaction’, as it is currently being deployed in this literature, obscures an accurate understanding of human social cognition

    An exploration of the rhythm of Malay

    Get PDF
    In recent years there has been a surge of interest in speech rhythm. However we still lack a clear understanding of the nature of rhythm and rhythmic differences across languages. Various metrics have been proposed as means for measuring rhythm on the phonetic level and making typological comparisons between languages (Ramus et al, 1999; Grabe & Low, 2002; Dellwo, 2006) but the debate is ongoing on the extent to which these metrics capture the rhythmic basis of speech (Arvaniti, 2009; Fletcher, in press). Furthermore, cross linguistic studies of rhythm have covered a relatively small number of languages and research on previously unclassified languages is necessary to fully develop the typology of rhythm. This study examines the rhythmic features of Malay, for which, to date, relatively little work has been carried out on aspects rhythm and timing. The material for the analysis comprised 10 sentences produced by 20 speakers of standard Malay (10 males and 10 females). The recordings were first analysed using rhythm metrics proposed by Ramus et. al (1999) and Grabe & Low (2002). These metrics (∆C, %V, rPVI, nPVI) are based on durational measurements of vocalic and consonantal intervals. The results indicated that Malay clustered with other so-called syllable-timed languages like French and Spanish on the basis of all metrics. However, underlying the overall findings for these metrics there was a large degree of variability in values across speakers and sentences, with some speakers having values in the range typical of stressed-timed languages like English. Further analysis has been carried out in light of Fletcher’s (in press) argument that measurements based on duration do not wholly reflect speech rhythm as there are many other factors that can influence values of consonantal and vocalic intervals, and Arvaniti’s (2009) suggestion that other features of speech should also be considered in description of rhythm to discover what contributes to listeners’ perception of regularity. Spectrographic analysis of the Malay recordings brought to light two parameters that displayed consistency and regularity for all speakers and sentences: the duration of individual vowels and the duration of intervals between intensity minima. This poster presents the results of these investigations and points to connections between the features which seem to be consistently regulated in the timing of Malay connected speech and aspects of Malay phonology. The results are discussed in light of current debate on the descriptions of rhythm

    Turn-taking patterns in human discourse and their impact on group communication service design

    Get PDF
    Recent studies demonstrated the benefit of integrating speaker prediction features into the design of group-communication services supporting multiparty online discourse. This paper aims at delivering a more elaborate analysis of speaker prediction by analyzing a larger volume of data. Moreover, it tests the existence of speakers dominating speaking time. Towards this end, we analyze tens of hours of recorded meeting and lecture sessions. Our principal results for meeting-like interaction manifest that the next speaker is one of the last four speakers with over 90% probability. This is seen consistently across our data with little variance (standard deviation of 8.71%) independent of the total number of potential speakers. Furthermore, lecture time is in most cases significantly dominated by the tutor. In meetings, although a single dominating speaker is always evident, domination exhibited high variability. Generally, our findings strengthen and further motivate the act of incorporating user-beha vior awareness into group communication service desig

    Analysis of the sensitivity of the End-Of-Turn Detection task to errors generated by the Automatic Speech Recognition process.

    Get PDF
    An End-Of-Turn Detection Module (EOTD-M) is an essential component of au- tomatic Spoken Dialogue Systems. The capability of correctly detecting whether a user’s utterance has ended or not improves the accuracy in interpreting the meaning of the message and decreases the latency in the answer. Usually, in di- alogue systems, an EOTD-M is coupled with an Automatic Speech Recognition Module (ASR-M) to transmit complete utterances to the Natural Language Un- derstanding unit. Mistakes in the ASR-M transcription can have a strong effect on the performance of the EOTD-M. The actual extent of this effect depends on the particular combination of ASR-M transcription errors and the sentence featurization techniques implemented as part of the EOTD-M. In this paper we investigate this important relationship for an EOTD-M based on semantic information and particular characteristics of the speakers (speech profiles). We introduce an Automatic Speech Recognition Simulator (ASR-SIM) that mod- els different types of semantic mistakes in the ASR-M transcription as well as different speech profiles. We use the simulator to evaluate the sensitivity to ASR-M mistakes of a Long Short-Term Memory network classifier trained in EOTD with different featurization techniques. Our experiments reveal the dif- ferent ways in which the performance of the model is influenced by the ASR-M errors. We corroborate that not only is the ASR-SIM useful to estimate the performance of an EOTD-M in customized noisy scenarios, but it can also be used to generate training datasets with the expected error rates of real working conditions, which leads to better performance.EMPATHIC IT1244-19 TIN2016-78365-R PID2019-104966GB-I00

    A Study of Prosodic Entrainment and Social Factors in Mandarin Conversations

    Get PDF
    In conversations, interlocutors usually adopt prosody to that of their partner, and they become similar in prosodic production for successful communication. This phenomenon of prosodic entrainment is related to complex factors. This study aims to explore the relationship between prosodic entrainment and social factors. Two analyses are accomplished: the analysis of prosodic entrainment and gender, and the analysis of prosodic entrainment and role. In terms of prosodic entrainment and gender, it is found that the most prosodic features are entrained in female-male conversations, and the least in male-male conversations. In terms of prosodic entrainment and roles, it is found that different roles have influence on the entrainment degree, and information givers entrain more to followers in conversation

    Temporal entrainment in overlapping speech

    Get PDF
    Wlodarczak M. Temporal entrainment in overlapping speech. Bielefeld: Bielefeld University; 2014
    corecore