145 research outputs found

    Assessing objective characterizations of phonetic convergence

    No full text
    International audienceThis paper focuses on the study of the convergence between characteristics of speech segments- i.e. spectral characteristics of speech sounds - during live interactions between speaking dyads. The interaction data has been collected using an original verbal game called 'verbal dominoes' that provides a dense sampling of the acoustic spaces of the interlocutors. Two methods for characterizing phonetic convergence are here compared. The first one is based on a fine-grained analysis of the spectra of central frames of vowels (LDA) while the second one uses a more global speaker recognition technique (LLR). We show that convergence rates calculated by the two techniques correlate as the number of dominoes increases and that the LDA method well resists to the decrease of training and test material. We finally comment the impact of several factors on the computed convergence rates, i.e. interlocutors' familiarity and sex pairs

    Speakers are more cooperative and less individual when interacting in larger group sizes

    Full text link
    Introduction: Cooperation, acoustically signaled through vocal convergence, is facilitated when group members are more similar. Excessive vocal convergence may, however, weaken individual recognizability. This study aimed to explore whether constraints to convergence can arise in circumstances where interlocutors need to enhance their vocal individuality. Therefore, we tested the effects of group size (3 and 5 interactants) on vocal convergence and individualization in a social communication scenario in which individual recognition by voice is at stake. Methods: In an interactive game, players had to recognize each other through their voices while solving a cooperative task online. The vocal similarity was quantified through similarities in speaker i-vectors obtained through probabilistic linear discriminant analysis (PLDA). Speaker recognition performance was measured through the system Equal Error Rate (EER). Results: Vocal similarity between-speakers increased with a larger group size which indicates a higher cooperative vocal behavior. At the same time, there wasan increase in EER for the same speakers between the smaller and the largergroup size, meaning a decrease in overall recognition performance. Discussion: The decrease in vocal individualization in the larger group size suggests thatingroup cooperation and social cohesion conveyed through acoustic convergence have priority over individualization in larger groups of unacquainted speakers

    The ART of Conversation: Measuring Phonetic Convergence and Deliberate Imitation in L2-Speech with a Siamese RNN

    Full text link
    Phonetic convergence describes the automatic and unconscious speech adaptation of two interlocutors in a conversation. This paper proposes a Siamese recurrent neural network (RNN) architecture to measure the convergence of the holistic spectral characteristics of speech sounds in an L2-L2 interaction. We extend an alternating reading task (the ART) dataset by adding 20 native Slovak L2 English speakers. We train and test the Siamese RNN model to measure phonetic convergence of L2 English speech from three different native language groups: Italian (9 dyads), French (10 dyads) and Slovak (10 dyads). Our results indicate that the Siamese RNN model effectively captures the dynamics of phonetic convergence and the speaker's imitation ability. Moreover, this text-independent model is scalable and capable of handling L1-induced speaker variability.Comment: Accepted at INTERSPEECH 202

    Speakers are more cooperative and less individual when interacting in larger group sizes

    Get PDF
    IntroductionCooperation, acoustically signaled through vocal convergence, is facilitated when group members are more similar. Excessive vocal convergence may, however, weaken individual recognizability. This study aimed to explore whether constraints to convergence can arise in circumstances where interlocutors need to enhance their vocal individuality. Therefore, we tested the effects of group size (3 and 5 interactants) on vocal convergence and individualization in a social communication scenario in which individual recognition by voice is at stake.MethodsIn an interactive game, players had to recognize each other through their voices while solving a cooperative task online. The vocal similarity was quantified through similarities in speaker i-vectors obtained through probabilistic linear discriminant analysis (PLDA). Speaker recognition performance was measured through the system Equal Error Rate (EER).ResultsVocal similarity between-speakers increased with a larger group size which indicates a higher cooperative vocal behavior. At the same time, there was an increase in EER for the same speakers between the smaller and the larger group size, meaning a decrease in overall recognition performance.DiscussionThe decrease in vocal individualization in the larger group size suggests that ingroup cooperation and social cohesion conveyed through acoustic convergence have priority over individualization in larger groups of unacquainted speakers

    Dynamics of short-term cross-dialectal accommodation. A study on Grison and Zurich German

    Full text link
    This study investigates whether rhythmic features are object of accommodation between Grison and Zurich German (henceforth GRG and ZHG) speakers, insomuch as it was previously observed for vowel formants. Cross-dialectal rhythmic accommodation and its evoking/inhibiting factors (e.g., acoustic distance vs dialect markedness, new vs previously heard words) were examined in a corpus of pre-and post-dialogue recordings, performed by 18 pairs of GRG and ZHG speakers. Three rhythmic measures were designed which were based on cross-dialectal timing differences related to intervocalic sonorants gemination, open syllable lengthening and reduction of word-final vowels

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Prosodic Convergence, Divergence, and Feedback: Coherence and Meaning In Conversation

    Get PDF

    Acomodación fonética durante las interacciones conversacionales: una visión general

    Get PDF
    During conversational interactions such as tutoring, instruction-giving tasks, verbal negotiations, or just talking with friends, interlocutors’ behaviors experience a series of changes due to the characteristics of their counterpart and to the interaction itself. These changes are pervasively present in every social interaction, and most of them occur in the sounds and rhythms of our speech, which is known as acoustic-prosodic accommodation, or simply phonetic accommodation. The consequences, linguistic and social constraints, and underlying cognitive mechanisms of phonetic accommodation have been studied for at least 50 years, due to the importance of the phenomenon to several disciplines such as linguistics, psychology, and sociology. Based on the analysis and synthesis of the existing empirical research literature, in this paper we present a structured and comprehensive review of the qualities, functions, onto- and phylogenetic development, and modalities of phonetic accommodation.Durante las interacciones conversacionales como dar una tutoría, dar instrucciones, las negociaciones verbales, o simplemente hablar con amigos, los comportamientos de las personas experimentan una serie de cambios debido a las características de su interlocutor y a la interacción en sí. Estos cambios están presentes en cada interacción social, y la mayoría de ellos ocurre en los sonidos y ritmos del habla, lo cual se conoce como acomodación acústico-prosódica, o simplemente acomodación fonética. Las consecuencias, las limitaciones lingüísticas y sociales, y los mecanismos cognitivos subyacentes a la acomodación fonética se han estudiado durante al menos 50 años, debido a la importancia del fenómeno para varias disciplinas como la lingüística, la psicología, y la sociología. A partir del análisis y síntesis de la literatura de investigación empírica existente, en este artículo presentamos una revisión estructurada y exhaustiva de las cualidades, funciones, desarrollo onto- y filogenético, y modalidades de la acomodación fonética

    The Effect Of Rhyming Word Dominoes (Rwd) On Ability To Differentiate English Vowel Sounds (An Experimental Study At English Course Students Of Islamic Boarding School Mambaus Sholihin

    Get PDF
    Pronunciation is one important factor to consider speaking competency such that the student can quickly understand appropriately and respond to sentences or utterances directed at them, but it includes a factor that becomes students’ obstacles to speak English. One of the problems is that the English word spelling does not match the pronunciation; on the other hand, Indonesian word pronunciation is like the spelling. This study investigated the effect of RWD on students’ ability to differentiate English vowel sounds by utilizing Rhyming Word Dominoes (RWD). RWD is rhyming word packed on flash cards in the form of dominoes which were played by the students by reading aloud rhyming words written on the cards. This technique, as part of audio-language method, is expected to be an alternative technique instead of a technique used in the grammar-translation method. This study employed an experimental method under the following design: true experimental design with randomized subjects and pretest-posttest control group design. The sample total for this study was 68 students consisting of 34 students for each treatment and control group. To assess the effect of RWD technique, this study used paired sample t-test to measure the difference between pretest and posttest and between control and treatment group. The findings revealed that both the experimental and the control groups had performed a statistically significant development at the posttest. Although the control group increased their ability to differentiate English vowel sounds, the fact that the experimental group performed a significantly higher development implied that RWD had a significant effect on students’ ability to differentiate English vowel sounds than their usual method

    Multi‐speaker experimental designs: Methodological considerations

    Get PDF
    Research on language use has become increasingly interested in the multimodal and interactional aspects of language – theoretical models of dialogue, such as the Communication Accommodation Theory and the Interactive Alignment Model are examples of this. In addition, researchers have started to give more consideration to the relationship between physiological processes and language use. This article aims to contribute to the advancement in studies of physiological and/or multimodal language use in naturalistic settings. It does so by providing methodological recommendations for such multi-speaker experimental designs. It covers the topics of (a) speaker preparation and logistics, (b) experimental tasks and (c) data synchronisation and post-processing. The types of data that will be considered in further detail include audio and video, electroencephalography, respiratory data and electromagnetic articulography. This overview with recommendations is based on the answers to a questionnaire that was sent amongst the members of the Horizon 2020 research network ‘Conversational Brains’, several researchers in the field and interviews with three additional experts.H2020 Marie Skłodowska‐Curie Actions http://dx.doi.org/10.13039/100010665Peer Reviewe
    corecore