Search CORE

64 research outputs found

Human vocal attractiveness as signaled by body size projection

Author: Birkholz P
Lee A
Liu X
Wu W-L
Xu Y
Publication venue
Publication date: 01/01/2013
Field of study

Voice, as a secondary sexual characteristic, is known to affect the perceived attractiveness of human individuals. But the underlying mechanism of vocal attractiveness has remained unclear. Here, we presented human listeners with acoustically altered natural sentences and fully synthetic sentences with systematically manipulated pitch, formants and voice quality based on a principle of body size projection reported for animal calls and emotional human vocal expressions. The results show that male listeners preferred a female voice that signals a small body size, with relatively high pitch, wide formant dispersion and breathy voice, while female listeners preferred a male voice that signals a large body size with low pitch and narrow formant dispersion. Interestingly, however, male vocal attractiveness was also enhanced by breathiness, which presumably softened the aggressiveness associated with a large body size. These results, together with the additional finding that the same vocal dimensions also affect emotion judgment, indicate that humans still employ a vocal interaction strategy used in animal calls despite the development of complex language

CiteSeerX

Directory of Open Access Journals

UCL Discovery

PubMed Central

Publikationsserver der RWTH Aachen University

HKU Scholars Hub

FigShare

Methods and studies of laryngeal voice quality analysis in speech production

Author: Airas Matti
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2008
Field of study

Voice quality, defined by John Laver as the characteristic auditory colouring of a speaker's voice, is a significant feature of speech, and it is used to signal various properties such as emotions, intentions, and mood of the speaker. While voice quality measurement techniques and algorithms have been developed, much work is needed to obtain a comprehensive view of the function and analysis of human voice in the production of different voice qualities. Two major research questions are presented in this thesis: First, how can the most important laryngeal voice quality features be analyzed, and second, how do the voice quality features affect different facets of vocal expression? To answer these questions, five separate studies of the analysis methodology and two studies regarding the voice quality behaviour were published. The methodology articles describe a voice source analysis software package; a comparison of multiple voice source parameters in breathy, normal, and pressed phonation; a method for evaluating inverse filtering algorithms; comparison of two inverse filtering algorithms; and a method for analyzing intensity regulation of speech. One analysis article studies changes in the laryngeal voice quality when different emotions are expressed in speech and another voice quality changes in expression of prominence in continuous speech. The methodology studies resulted in new tools, methods, and guidelines for voice source analysis, while the analysis studies provide information on how voice quality is used in expressive speech

Aaltodoc Publication Archive

Triangular body-cover model of the vocal folds with coordinated activation of the five intrinsic laryngeal muscles

Author: Alzamendi Gabriel Alejandro
Erath Byron D.
Hillman Robert E.
Peterson Sean D.
Zañartu Matías
Publication venue: Journal of Animal Ecology
Publication date: 24/11/2021
Field of study

Poor laryngeal muscle coordination that results in abnormal glottal posturing is believed to be a primary etiologic factor in common voice disorders such as non-phonotraumatic vocal hyperfunction. Abnormal activity of antagonistic laryngeal muscles is hypothesized to play a key role in the alteration of normal vocal fold biomechanics that results in the dysphonia associated with such disorders. Current low-order models of the vocal folds are unsatisfactory to test this hypothesis since they do not capture the co-contraction of antagonist laryngeal muscle pairs. To address this limitation, a self-sustained triangular body-cover model with full intrinsic muscle control is introduced. The proposed scheme shows good agreement with prior studies using finite element models, excised larynges, and clinical studies in sustained and time-varying vocal gestures. Simulations of vocal fold posturing obtained with distinct antagonistic muscle activation yield clear differences in kinematic, aerodynamic, and acoustic measures. The proposed tool is deemed sufficiently accurate and flexible for future comprehensive investigations of non-phonotraumatic vocal hyperfunction and other laryngeal motor control disorders.Fil: Alzamendi, Gabriel Alejandro. Universidad Nacional de Entre Ríos. Instituto de Investigación y Desarrollo en Bioingeniería y Bioinformática - Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Investigación y Desarrollo en Bioingeniería y Bioinformática; ArgentinaFil: Peterson, Sean D.. University of Waterloo; CanadáFil: Erath, Byron D.. Clarkson University; Estados UnidosFil: Hillman, Robert E.. Massachusetts General Hospital; Estados UnidosFil: Zañartu, Matías. Universidad Tecnica Federico Santa Maria.; Chil

arXiv.org e-Print Archive

CONICET Digital

PubMed Central

Recommended from our members

A novel framework for high-quality voice source analysis and synthesis

Author: Turajlic Emir
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2006
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The analysis, parameterization and modeling of voice source estimates obtained via inverse filtering of recorded speech are some of the most challenging areas of speech processing owing to the fact humans produce a wide range of voice source realizations and that the voice source estimates commonly contain artifacts due to the non-linear time-varying source-filter coupling. Currently, the most widely adopted representation of voice source signal is Liljencrants-Fant's (LF) model which was developed in late 1985. Due to the overly simplistic interpretation of voice source dynamics, LF model can not represent the fine temporal structure of glottal flow derivative realizations nor can it carry the sufficient spectral richness to facilitate a truly natural sounding speech synthesis. In this thesis we have introduced Characteristic Glottal Pulse Waveform Parameterization and Modeling (CGPWPM) which constitutes an entirely novel framework for voice source analysis, parameterization and reconstruction. In comparative evaluation of CGPWPM and LF model we have demonstrated that the proposed method is able to preserve higher levels of speaker dependant information from the voice source estimates and realize a more natural sounding speech synthesis. In general, we have shown that CGPWPM-based speech synthesis rates highly on the scale of absolute perceptual acceptability and that speech signals are faithfully reconstructed on consistent basis, across speakers, gender. We have applied CGPWPM to voice quality profiling and text-independent voice quality conversion method. The proposed voice conversion method is able to achieve the desired perceptual effects and the modified speech remained as natural sounding and intelligible as natural speech. In this thesis, we have also developed an optimal wavelet thresholding strategy for voice source signals which is able to suppress aspiration noise and still retain both the slow and the rapid variations in the voice source estimate

Brunel University Research Archive

Ihmisen äänentuoton analysointi käänteissuodatuksen, suurnopeuskuvauksen ja elektroglottografian avulla

Author: Pulakka Hannu
Publication venue: Teknillinen korkeakoulu
Publication date: 01/01/2005
Field of study

Human voice production was studied using three methods: inverse filtering, digital high-speed imaging of the vocal folds, and electroglottography. The primary goal was to evaluate an inverse filtering method by comparing inverse filtered glottal flow estimates with information obtained by the other methods. More detailed examination of the human voice source behavior was also included in the work. Material from two experiments was analyzed in this study. The data of the first experiment consisted of simultaneous recordings of acoustic speech signal, electroglottogram, and high-speed imaging acquired during sustained vowel phonations. Inverse filtered glottal flow estimates were compared with glottal area waveforms derived from the image material by calculating pulse shape parameters from the signals. The material of the second experiment included recordings of acoustic speech signal and electroglottogram during phonations of sustained vowels. This material was utilized for the analysis of the opening phase and the closing phase of vocal fold vibration. The evaluated inverse filtering method was found to produce mostly reasonable estimates of glottal flow. However, the parameters of the system have to be set appropriately, which requires experience on inverse filtering and speech production. The flow estimates often showed a two-stage opening phase with two instants of rapid increase in the flow derivative. The instant of glottal opening detected in the electroglottogram was often found to coincide with an increase in the flow derivative. The instant of minimum flow derivative was found to occur mostly during the last quarter of the closing phase and it was shown to precede the closing peak of the differentiated electroglottogram.Ihmisen puheentuottoa tutkittiin kolmella menetelmällä: käänteissuodatuksella, äänihuulten digitaalisella suurnopeuskuvauksella ja elektroglottografialla. Päätavoitteena oli tarkastella erään käänteissuodatusmenetelmän toimintaa vertailemalla näillä menetelmillä saatua informaatiota äänihuulten värähtelystä. Lisäksi tutkittiin tarkemmin eräitä äänilähteen käyttäytymisen yksityiskohtia. Tutkimuksessa analysoitiin aineistoa kahdesta koejärjestelystä. Ensimmäisessä kokeessa tallennettiin samanaikaisesti äänisignaali, elektroglottogrammi ja suurnopeuskuvamateriaalia äänihuulista koehenkilöiden tuottaessa pitkiä vokaaleita. Käänteissuodatuksella saaduista glottisvirtausestimaateista sekä kuvamateriaalin ilmaisemasta ääniraon pinta-alavaihtelusta laskettiin pulssiparametreja, joiden avulla vertailtiin virtauksen ja ääniraon pinta-alan käyttäytymistä. Toisen koejärjestelyn aineisto koostui äänisignaalista ja elektroglottogrammista, jotka oli tallennettu vokaaliääntöjen aikana. Tämän materiaalin perusteella analysoitiin ääniraon avautumis- ja sulkeutumisvaihetta. Tarkastellun käänteissuodatusmenetelmän todettiin tuottavan enimmäkseen luotettavia virtausestimaatteja edellyttäen, että menetelmän parametrit asetetaan tarkoituksenmukaisesti, mikä vaatii käyttäjältä kokemusta käänteissuodatuksesta ja ihmisen puheentuotosta. Glottisvirtauksen avautumisvaiheen havaittiin olevan useissa virtausestimaateissa kaksivaiheinen siten, että virtauksen kasvu voimistuu nopeasti kahdessa kohdassa sulkeutumisen ja maksimivirtauksen välillä. Virtauksen kasvun todettiin usein voimistuvan elektroglottogrammista tunnistetun ääniraon avautumishetken lähellä. Virtauksen derivaatan minimikohdan havaittiin sijoittuvan enimmäkseen virtauksen sulkeutumisvaiheen viimeiseen neljännekseen, ja sen osoitettiin esiintyvän ennen elektroglottogrammin derivaatan minimikohtaa

Aaltodoc Publication Archive

A novel framework for high-quality voice source analysis and synthesis

Author: Turajlic Emir
Vaseghi S
Publication venue
Publication date: 01/01/2006
Field of study

The analysis, parameterization and modeling of voice source estimates obtained via inverse filtering of recorded speech are some of the most challenging areas of speech processing owing to the fact humans produce a wide range of voice source realizations and that the voice source estimates commonly contain artifacts due to the non-linear time-varying source-filter coupling. Currently, the most widely adopted representation of voice source signal is Liljencrants-Fant's (LF) model which was developed in late 1985. Due to the overly simplistic interpretation of voice source dynamics, LF model can not represent the fine temporal structure of glottal flow derivative realizations nor can it carry the sufficient spectral richness to facilitate a truly natural sounding speech synthesis. In this thesis we have introduced Characteristic Glottal Pulse Waveform Parameterization and Modeling (CGPWPM) which constitutes an entirely novel framework for voice source analysis, parameterization and reconstruction. In comparative evaluation of CGPWPM and LF model we have demonstrated that the proposed method is able to preserve higher levels of speaker dependant information from the voice source estimates and realize a more natural sounding speech synthesis. In general, we have shown that CGPWPM-based speech synthesis rates highly on the scale of absolute perceptual acceptability and that speech signals are faithfully reconstructed on consistent basis, across speakers, gender. We have applied CGPWPM to voice quality profiling and text-independent voice quality conversion method. The proposed voice conversion method is able to achieve the desired perceptual effects and the modified speech remained as natural sounding and intelligible as natural speech. In this thesis, we have also developed an optimal wavelet thresholding strategy for voice source signals which is able to suppress aspiration noise and still retain both the slow and the rapid variations in the voice source estimate.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OpenGrey Repository

Vocal qualities in female singing.

Author: Evans Michelle
Publication venue: University of York
Publication date: 01/01/1995
Field of study

White Rose E-theses Online

Automated measures of dysphonias and the phonatory effects of asymmetries in the posterior larynx

Author: Vieira Maurilio Nunes
Publication venue: The University of Edinburgh
Publication date: 01/01/1997
Field of study

Edinburgh Research Archive

Exploring the contribution of voice quality to the perception of gender in Scottish English

Author: Pearce Jo
Publication venue
Publication date: 01/01/2020
Field of study

This study investigates how voice quality, here phonation, aﬀects listener perception of speaker gender, and how voice quality interacts with pitch, a major cue to speaker gender, when cueing gender perceptions. Gender diﬀerences in voice quality have been identiﬁed in both Scottish (Beck and Schaeﬄer 2015; Stuart-Smith 1999) and American English (Abdelli-Beruh et al. 2014; D. Klatt and L. Klatt 1990; Podesva 2013; Syrdal 1996; Wolk et al. 2012; Yuasa 2010). There is evidence from previous research that suggest gender diﬀerences in voice quality may also inﬂuence listener perception of speaker gender, with breathy voice being perceived as feminine or female characteristic by listeners (Addington 1968; Andrews and Schmidt 1997; Bishop and Keating 2012; Holmberg et al. 2010; Porter 2012; Skuk and Schweinberger 2014; Van Borsel et al. 2009) and creaky voice being perceived as masculine characteristic (Greer 2015; Lee 2016). However, some studies have found that voice quality has little eﬀect (Booz and Ferguson 2016; King et al. 2012; Owen and Hancock 2010). The present study seeks to investigate the contribution of voice quality, taking into account the various methods of producing voice quality diﬀerences in stimuli, cultural diﬀerences in gendered meanings of voice quality, and diﬀerent methods of quantifying ‘perceived gender’, which may contribute to the conﬂicting results of previous studies. To investigate the contribution of voice quality to perceptions of speaker gender, a perception experiment was be carried out where 32 Scottish listeners and 40 North American listeners heard stimuli with diﬀerent voice qualities (modal, breathy, creaky) and at diﬀerent pitch levels (120Hz, 165Hz, 210Hz), and were asked to make judgements about the gender of the speaker. Diﬀerences in voice quality were produced by a speaker with the ability to create voice quality distinctions, as well as created through copy synthesis from the speaker’s voice. Listeners were asked to indicate whether they thought the voice belonged to a man or a woman and rate how masculine and feminine the voice sounded. Relative to modal voice, I predicted that listeners would be more likely to categorise breathy voices as women, and would rate them as more feminine and less masculine, and that listeners would be less likely to categorise creaky voices as women, and would rate them as more masculine and less feminine. I also predicted that there might be diﬀerences in how Scottish listeners and North American listeners perceived voice quality, given that the gender diﬀerences in voice quality in these two varieties of English have been found to diﬀer in previous research. Consistent with my predictions, I found that relative to modal voice, listeners were more likely to categorise breathy voice stimuli as women, and rated breathy voice stimuli as more feminine and less masculine. However, in contrast with my predictions, I found that relative to modal voice, listeners were more likely to categorise creaky voice stimuli as women, and rated them as less masculine, but not more feminine. Furthermore, contrary to predictions, I did not identify diﬀerences between Scottish and North American listeners in terms of voice quality perception. Diﬀerences were also found in how breathy and creaky voice inﬂuence gender perception at diﬀerent pitch levels. Overall, these results show that voice quality has an important inﬂuence on listener perception of speaker gender, and that the gendered meanings of creaky voice are changing and have disassociated from its low pitch. Future research should consider whether this evaluation among Scottish listeners this may reﬂect a wider change in the gender diﬀerences in production

Glasgow Theses Service

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The Models and Analysis of Vocal Emissions with Biomedical Applications (MAVEBA) workshop came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy

Directory of Open Access Books (DOAB)