104 research outputs found

    Physiological and acoustic characteristics of the female music theater voice

    Get PDF
    International audienc

    Effect of being seen on the production of visible speech cues. A pilot study on Lombard speech

    No full text
    International audienceSpeech produced in noise (or Lombard speech) is characterized by increased vocal effort, but also by amplified lip gestures. The current study examines whether this enhancement of visible speech cues may be sought by the speaker, even unconsciously, in order to improve his visual intelligibility. One subject played an interactive game in a quiet situation and then in 85dB of cocktail-party noise, for three conditions of interaction: without interaction, in face-to-face interaction, and in a situation of audio interaction only. The audio signal was recorded simultaneously with articulatory movements, using 3D electromagnetic articulography. The results showed that acoustic modifications of speech in noise were greater when the interlocutor could not see the speaker. Furthermore, tongue movements that are hardly visible were not particularly amplified in noise. Lip movements that are very visible were not more enhanced in noise when the interlocutors could see each other. Actually, they were more enhanced in the situation of audio interaction only. These results support the idea that this speaker did not make use of the visual channel to improve his intelligibility, and that his hyper articulation was just an indirect correlate of increased vocal effort

    Physiological and acoustic characteristics of the female musical theater voice in ‘belt’ and ‘legit’ qualities

    Get PDF
    ABSTRACT A study was conducted on six female Music Theatre singers. Audio and Electroglottographic (EGG) signals were recorded simultaneously with the vocal tract impedance while the singers produced sustained pitches on two different qualities ('chesty belt', 'legit'). For each quality, two vowels (/Ε/, /o/) were investigated, at four increasing pitches over the F#4-D5 range (~370-600 Hz). Measured values of glottal parameters (Open Quotient, Amplitude of the EGG signal) support the idea that 'chesty belt' is produced in the first laryngeal mechanism (M1) and 'legit' in the second one (M2). The frequency of the first vocal tract resonance (R1) was found to be systematically higher in 'chesty belt', close to the second voice harmonic (2f 0 ). These observations were consistent with greater intensities and energy above 1 kHz in 'chesty belt' compared to 'legit'

    Diverse resonance tuning strategies for women singers

    No full text
    International audienceOver the range 200 to 2000 Hz, the fundamental frequency f0 of women's singing voices covers the range of the first two resonances (R1 and R2) of the vocal tract. This allows diverse techniques of resonance tuning. Resonances were measured using broadband excitation at their lips. A commonly noted strategy, used by sopranos, and some altos, is to tune R1 close to the fundamental frequency f0 (R1:f0 tuning) once f0 approached the value of R1 of that vowel in speech. At extremely high pitch, sopranos could no longer increase R1 sufficiently and switched from R1:f0 to R2:f0 tuning. At lower pitch many singers of various singing styles found it advantageous to use R1:2f0 tuning Additionally, many sopranos employed R2:2f0 tuning over some of their range, often simultaneously with R1:f0 tuning

    Diverse resonance tuning strategies for women singers

    No full text
    International audienceOver the range 200 to 2000 Hz, the fundamental frequency f0 of women's singing voices covers the range of the first two resonances (R1 and R2) of the vocal tract. This allows diverse techniques of resonance tuning. Resonances were measured using broadband excitation at their lips. A commonly noted strategy, used by sopranos, and some altos, is to tune R1 close to the fundamental frequency f0 (R1:f0 tuning) once f0 approached the value of R1 of that vowel in speech. At extremely high pitch, sopranos could no longer increase R1 sufficiently and switched from R1:f0 to R2:f0 tuning. At lower pitch many singers of various singing styles found it advantageous to use R1:2f0 tuning Additionally, many sopranos employed R2:2f0 tuning over some of their range, often simultaneously with R1:f0 tuning

    Converging toward a common speech code: imitative and perceptuo-motor recalibration processes in speech production

    Get PDF
    International audienceAuditory and somatosensory systems play a key role in speech motor control. In the act of speaking, segmental speech movements are programmed to reach phonemic sensory goals, which in turn are used to estimate actual sensory feedback in order to further control production. The adult's tendency to automatically imitate a number of acoustic-phonetic characteristics in another speaker's speech however suggests that speech production not only relies on the intended phonemic sensory goals and actual sensory feedback but also on the processing of external speech inputs. These online adaptive changes in speech production, or phonetic convergence effects, are thought to facilitate conversational exchange by contributing to setting a common perceptuo-motor ground between the speaker and the listener. In line with previous studies on phonetic convergence, we here demonstrate, in a non-interactive situation of communication, online unintentional and voluntary imitative changes in relevant acoustic features of acoustic vowel targets (fundamental and first formant frequencies) during speech production and imitation. In addition, perceptuo-motor recalibration processes, or after-effects, occurred not only after vowel production and imitation but also after auditory categorization of the acoustic vowel targets. Altogether, these findings demonstrate adaptive plasticity of phonemic sensory-motor goals and suggest that, apart from sensory-motor knowledge, speech production continuously draws on perceptual learning from the external speech environment

    Plasticity of sensory-motor goals in speech production: behavioral evidence from phonetic convergence and speech imitation

    Get PDF
    International audienceImitation is one of the major processes by which humans develop social interactions. In speech communication, imitative processes are used from birth to adulthood, as highlighted by children’s mimicking abilities and by adult’s tendency to automatically “imitate” a number of acoustic-phonetic characteristics in another speaker’s speech. These adaptive changes are thought to play a key role in speech development/acquisition and to facilitate conversational exchange by contributing to setting a common perceptuo-motor link between speakers. Based on acoustic analyses of speech production in various laboratory tasks, the present study aimed to better characterize sensory-to-motor adaptive processes involved in unintentional as well as voluntary speech imitation, and to test possible motor plastic changes due to auditory-motor recalibration mechanisms

    Make That Sound More 'Metallic': Towards a Perceptually Relevant Control of the Timbre of Synthesizer Sounds Using a Variational Autoencoder

    Get PDF
    In this article, we propose a new method of sound transformation based on control parameters that are intuitive and relevant for musicians. This method uses a variational autoencoder (VAE) model that is first trained in an unsupervised manner on a large dataset of synthesizer sounds. Then, a perceptual regularization term is added to the loss function to be optimized, and a supervised fine-tuning of the model is carried out using a small subset of perceptually labeled sounds. The labels were obtained from a perceptual test of Verbal Attribute Magnitude Estimation in which listeners rated this training sound dataset along eight perceptual dimensions (French equivalents of 'metallic, warm, breathy, vibrating, percussive, resonating, evolving, aggressive'). These dimensions were identified as relevant for the description of synthesizer sounds in a first Free Verbalization test. The resulting VAE model was evaluated by objective reconstruction measures and a perceptual test. Both showed that the model was able, to a certain extent, to capture the acoustic properties of most of the perceptual dimensions and to transform sound timbre along at least two of them ('aggressive' and 'vibrating') in a perceptually relevant manner. Moreover, it was able to generalize to unseen samples even though a small set of labeled sounds was used

    The listening talker: A review of human and algorithmic context-induced modifications of speech

    Get PDF
    International audienceSpeech output technology is finding widespread application, including in scenarios where intelligibility might be compromised - at least for some listeners - by adverse conditions. Unlike most current algorithms, talkers continually adapt their speech patterns as a response to the immediate context of spoken communication, where the type of interlocutor and the environment are the dominant situational factors influencing speech production. Observations of talker behaviour can motivate the design of more robust speech output algorithms. Starting with a listener-oriented categorisation of possible goals for speech modification, this review article summarises the extensive set of behavioural findings related to human speech modification, identifies which factors appear to be beneficial, and goes on to examine previous computational attempts to improve intelligibility in noise. The review concludes by tabulating 46 speech modifications, many of which have yet to be perceptually or algorithmically evaluated. Consequently, the review provides a roadmap for future work in improving the robustness of speech output

    Régulation et optimisation des efforts dans la communication parlée

    No full text
    Some people spend, or feel, more effort than others when they speak. This may be due to a disorder or pathology (e.g. stuttering, Parkinson's disease). It may also lead them, if they use their voice in their professional life in particular, to develop chronic fatigue, or even damage to the vocal folds (dysphonia). Based on this observation, my research work deals with the notions of communication effort and efficiency from different angles: from acoustics and physiology, conversational interaction or clinical phonetics, and in all cases, in relation to the cognitive mechanisms of speech production - potentially dysfunctional in certain speech disorders. Within this framework, I will begin by synthesizing a body of work aimed at characterizing gestural coordinations - observed in people who are experts in certain vocal techniques (source-filter interaction, muscular synergies...). - on the contrary, sub-optimal, observed in people who stutter (during episodes of disfluency, but also the rest of the time). As well as several studies showing how different individuals adapt their communication more or less effectively to disturbed situations (such as noise), in particular by being able or not to exploit the visual modality. I will then summarize different studies that I have conducted to explore the cognitive mechanisms underlying the coordination of speech gestures and the regulation of communication efforts, with a first focus on the mechanisms of rhythm perception and reproduction in people who stutter - possibly at the root of their coordination difficulties. Then a second focus on the mechanisms of speech adaptation to disturbed communication situations, based on analysis of the disturbance but also of available resources, taking into account feedback from the interlocutor, as well as knowledge of the phonological system and reinforcement of specific contrasts.In a third part, I will present a number of studies aiming to apply this knowledge to the field of speech therapy, for the prevention of voice disorders in teachers (effectiveness of existing programs, improving the applicability of advice), and for the improvement of speech fluency in people who stutter, using different types of rhythmic stimulations.Certains individus dépensent, ou ressentent, davantage d'effort que d'autres lorsqu'ils parlent. Cela peut provenir d'un trouble ou d'une pathologie dont ils souffrent (e.g. bégaiement, maladie de Parkinson). Ou les amener, s'ils utilisent intensément leur voix dans leur vie professionnelle par exemple, à développer une fatigue chronique, voire des lésions au niveau des plis vocaux (dysphonie). Partant de cette constatation, mes travaux abordent les notions d’effort et d’efficacité de communication sous différents angles : de l'acoustique et de la physiologie, de l’interaction conversationnelle ou de la phonétique clinique, et dans tous les cas, en lien avec les mécanismes cognitifs de production de la parole – potentiellement dysfonctionnels dans certains troubles de la parole. Dans ce cadre, je commencerai par synthétiser un ensemble de travaux visant à caractériser des coordinations gestuelles • relativement optimales, observées chez des personnes expertes de certaines techniques vocales (interaction source-filtre, synergies musculaires…)• au contraire, sub-optimales, observées chez des personnes qui bégaient (pendant les épisodes de disfluence, mais également le reste du temps).Ainsi que plusieurs études montrant comment différents individus adaptent leur communication de façon plus ou moins efficace à des situations perturbées (comme du bruit), en étant notamment capables ou non d'exploiter la modalité visuelle.Nous nous intéresserons ensuite à l'exploration des mécanismes cognitifs sous-tendant la coordination des gestes de parole et à la régulation des efforts de communication, avec un premier focus sur les mécanismes de perception et reproduction du rythme chez les personnes qui bégaient –possiblement à l'origine de leurs difficultés de coordination. Puis un deuxième focus sur les mécanismes d'adaptation de parole à des situations de communication perturbée, reposant sur l'analyse de la perturbation mais aussi des ressources disponibles, sur la prise en compte des retours de l'interlocuteur, ainsi que sur la connaissance du système phonologique et le renforcement de contrastes spécifiques.Je rebouclerai sur l'application de ces différentes connaissances au domaine de l'orthophonie, en présentant différents travaux menés sur la prévention des troubles de la voix chez les enseignants (efficacité de programmes existants, amélioration de l'applicabilité des conseils), ainsi qu'un autre ensemble de travaux sur la façon dont différents types de stimulations rythmiques peuvent améliorer la fluence de personnes qui bégaient
    corecore