126 research outputs found

    Comparing non-verbal vocalisations in conversational speech corpora

    Get PDF
    Conversations do not only consist of spoken words but they also consist of non-verbal vocalisations. Since there is no standard to define and to classify (possible) non-speech sounds the annotations for these vocalisations differ very much for various corpora of conversational speech. There seems to be agreement in the six inspected corpora that hesitation sounds and feedback vocalisations are considered as words (without a standard orthography). The most frequent non-verbal vocalisation are laughter on the one hand and, if considered a vocal sound, breathing noises on the other

    On the acoustics of overlapping laughter in conversational speech

    Get PDF
    The social nature of laughter invites people to laugh together. This joint vocal action often results in overlapping laughter. In this paper, we show that the acoustics of overlapping laughs are different from non-overlapping laughs. We found that overlapping laughs are stronger prosodically marked than non-overlapping ones, in terms of higher values for duration, mean F0, mean and maximum intensity, and the amount of voicing. This effect is intensified by the number of people joining in the laughter event, which suggests that entrainment is at work. We also found that group size affects the number of overlapping laughs which illustrates the contagious nature of laughter. Finally, people appear to join laughter simultaneously at a delay of approximately 500 ms; a delay that must be considered when developing spoken dialogue systems that are able to respond to users’ laughs

    Acoustic, Morphological, and Functional Aspects of `yeah/ja' in Dutch, English and German

    Get PDF
    We explore different forms and functions of one of the most common feedback expressions in Dutch, English, and German, namely `yeah/ja' which is known for its multi-functionality and ambiguous usage in dialog. For example, it can be used as a yes-answer, or as a pure continuer, or as a way to show agreement. In addition, `yeah/ja' can be used in its single form, but it can also be combined with other particles, forming multi-word expressions, especially in Dutch and German. We have found substantial differences on the morpho-lexical level between the three related languages which enhances the ambiguous character of `yeah/ja'. An explorative analysis of the prosodic features of `yeah/ja' has shown that mainly a higher intensity is used to signal speaker incipiency across the inspected languages

    Classification of cooperative and competitive overlaps in speech using cues from the context, overlapper, and overlappee

    Get PDF
    One of the major properties of overlapping speech is that it can be perceived as competitive or cooperative. For the development of real-time spoken dialog systems and the analysis of affective and social human behavior in conversations, it is important to (automatically) distinguish between these two types of overlap. We investigate acoustic characteristics of cooperative and competitive overlaps with the aim to develop automatic classifiers for the classification of overlaps. In addition to acoustic features, we also use information from gaze and head movement annotations. Contexts preceding and during the overlap are taken into account, as well as the behaviors of both the overlapper and the overlappee. We compare various feature sets in classification experiments that are performed on the AMI corpus. The best performances obtained lie around 27%–30% EER

    Backchannels: Quantity, Type and Timing Matters

    Get PDF
    In a perception experiment, we systematically varied the quantity, type and timing of backchannels. Participants viewed stimuli of a real speaker side-by-side with an animated listener and rated how human-like they perceived the latter's backchannel behavior. In addition, we obtained measures of appropriateness and optionality for each backchannel from key strokes. This approach allowed us to analyze the influence of each of the factors on entire fragments and on individual backchannels. The originally performed type and timing of a backchannel appeared to be more human-like, compared to a switched type or random timing. In addition, we found that nods are more often appropriate than vocalizations. For quantity, too few or too many backchannels per minute appeared to reduce the quality of the behavior. These findings are important for the design of algorithms for the automatic generation of backchannel behavior for artificial listeners

    Effects of perceived gender on the perceived social function of laughter

    Get PDF

    Towards Speech Emotion Recognition "in the wild" using Aggregated Corpora and Deep Multi-Task Learning

    Get PDF
    One of the challenges in Speech Emotion Recognition (SER) "in the wild" is the large mismatch between training and test data (e.g. speakers and tasks). In order to improve the generalisation capabilities of the emotion models, we propose to use Multi-Task Learning (MTL) and use gender and naturalness as auxiliary tasks in deep neural networks. This method was evaluated in within-corpus and various cross-corpus classification experiments that simulate conditions "in the wild". In comparison to Single-Task Learning (STL) based state of the art methods, we found that our MTL method proposed improved performance significantly. Particularly, models using both gender and naturalness achieved more gains than those using either gender or naturalness separately. This benefit was also found in the high-level representations of the feature space, obtained from our method proposed, where discriminative emotional clusters could be observed.Comment: Published in the proceedings of INTERSPEECH, Stockholm, September, 201

    Convergence of laughter in conversational speech: effects of quantity, temporal alignment and imitation

    Get PDF
    A crucial feature of spoken interaction is joint activity at various linguistic and phonetic levels that requires fine-tuned coordination. This study gives a brief overview on how laughing in conversational speech can be phonetically analysed as partner-specific adaptation and joint vocal action. Laughter as a feature of social bonding leads to the assumption that when laughter appears in dialogues it is performed by both interlocutors. One possible type of convergence is when the conversational partners adapt their amount of laughter during their interaction. This partner-specific adaptation for laughter has been shown by Campbell (2007a). Persons, initially unknown to each other and without any negative attitude to the unknown partner, had to talk in ten consecutive 30-min conversations (interval of one week). With each conversation the level of familiarity increased which was also reflected by the increasing number of their laughs. Smoski & Bachorowski (2003) also showed that familiarity plays a big role for the number of laughs: friends laugh more often together than strangers do. But there is also evidence that the level of social distance plays a role for phonetic convergence/divergence in speech in terms of extended voice onset time in stop consonants (Abrego-Collier et al. 2011). Figure 1 illustrates the convergence effect in terms of the number of laughs for two speech corpora of task-based dyadic conversations (Anderson et al. 1991 for a map task; Baker & Hazan 2011 for a spot-the-difference game) with rather high correlation values. However, the familiarity effec

    Exploring sequences of speech and laughter activity using visualisations of conversations

    Get PDF
    In this study, we analysed laughter in dyadic conversational interaction. We attempted to categorise patterns of speaking and laughing activity in conversation in order to gain more insight into how speaking and laughing are timed and related to each other. Special attention was paid to a particular sequencing of speech and laughter activity that is intended to invite an interlocutor to laugh (i.e. ‘invitation-acceptance’ scheme): the speaker invites the listener to laugh by producing a laugh after his/her own utterance, indicating that it is appropriate to laugh. We explored these kinds of sequences through visualisations of speech and laughter activity in conversations. Based on manual transcriptions of the HCRC Map Task corpus, we generated visualisations of speech and laughter activity. Using these visualisations, we found that people indeed show a tendency to adhere to the ‘invitation-acceptance’ scheme and that people tend to ‘wait’ to be invited to a shared laughter event rather than to ‘anticipate’ it. These speech-and-laugh-activity plots have shown to be helpful in analysing the interplay between laughing and speaking in conversation and can be used as a tool to enhance the researcher’s intuition on underresearched fields

    A Multimodal Analysis of Vocal and Visual Backchannels in Spontaneous Dialogs

    Get PDF
    Backchannels (BCs) are short vocal and visual listener responses that signal attention, interest, and understanding to the speaker. Previous studies have investigated BC prediction in telephone-style dialogs from prosodic cues. In contrast, we consider spontaneous face-to-face dialogs. The additional visual modality allows speaker and listener to monitor each other's attention continuously, and we hypothesize that this affects the BC-inviting cues. In this study, we investigate how gaze, in addition to prosody, can cue BCs. Moreover, we focus on the type of BC performed, with the aim to find out whether vocal and visual BCs are invited by similar cues. In contrast to telephone-style dialogs, we do not find rising/falling pitch to be a BC-inviting cue. However, in a face-to-face setting, gaze appears to cue BCs. In addition, we find that mutual gaze occurs significantly more often during visual BCs. Moreover, vocal BCs are more likely to be timed during pauses in the speaker's speech
    corecore