35 research outputs found

    The role of F0 and phonation cues in Cantonese low tone perception

    Get PDF

    An Exploratory Study on Perceptual Spaces of the Singing Voice

    Get PDF
    Sixty participants provided dissimilarity ratings between various singing techniques. Multidimensional scaling, class averaging and clustering techniques were used to analyse timbral spaces and how they change between different singers, genders and registers. Clustering analysis showed that ground-truth similarity and silhouette scores that were not significantly different between gender or register conditions, while similarity scores were positively correlated with participants’ instrumental abilities and task comprehension. Participant feedback showed how a revised study design might mitigate noise in our data, leading to more detailed statistical results. Timbre maps and class distance analysis showed us which singing techniques remained similar to one another across gender and register conditions. This research provides insight into how the timbre space of singing changes under different conditions, highlights the subjectivity of perception between participants, and provides generalised timbre maps for regularisation in machine learnin

    Acoustic and videoendoscopic techniques to improve voice assessment via relative fundamental frequency

    Get PDF
    Quantitative measures of laryngeal muscle tension are needed to improve assessment and track clinical progress. Although relative fundamental frequency (RFF) shows promise as an acoustic estimate of laryngeal muscle tension, it is not yet transferable to the clinic. The purpose of this work was to refine algorithmic estimation of RFF, as well as to enhance the knowledge surrounding the physiological underpinnings of RFF. The first study used a large database of voice samples collected from 227 speakers with voice disorders and 256 typical speakers to evaluate the effects of fundamental frequency estimation techniques and voice sample characteristics on algorithmic RFF estimation. By refining fundamental frequency estimation using the Auditory Sawtooth Waveform Inspired Pitch Estimator—Prime (Auditory-SWIPE′) algorithm and accounting for sample characteristics via the acoustic measure, pitch strength, algorithmic errors related to the accuracy and precision of RFF were reduced by 88.4% and 17.3%, respectively. The second study sought to characterize the physiological factors influencing acoustic outputs of RFF estimation. A group of 53 speakers with voice disorders and 69 typical speakers each produced the utterance, /ifi/, while simultaneous recordings were collected using a microphone and flexible nasendoscope. Acoustic features calculated via the microphone signal were examined in reference to the physiological initiation and termination of vocal fold vibration. The features that corresponded with these transitions were then implemented into the RFF algorithm, leading to significant improvements in the precision of the RFF algorithm to reflect the underlying physiological mechanisms for voicing offsets (p < .001, V = .60) and onsets (p < .001, V = .54) when compared to manual RFF estimation. The third study further elucidated the physiological underpinnings of RFF by examining the contribution of vocal fold abduction to RFF during intervocalic voicing offsets. Vocal fold abductory patterns were compared to RFF values in a subset of speakers from the second study, comprising young adults, older adults, and older adults with Parkinson’s disease. Abductory patterns were not significantly different among the three groups; however, vocal fold abduction was observed to play a significant role in measures of RFF at voicing offset. By improving algorithmic estimation and elucidating aspects of the underlying physiology affecting RFF, this work adds to the utility of RFF for use in conjunction with current clinical techniques to assess laryngeal muscle tension.2021-09-29T00:00:00

    A Comprehensive Review of F0 and its Various Correlations

    Get PDF
    This paper examines many acoustic characteristics of F0 and their relevance in linguistic analysis. It also highlights correlations between F0 measurements and vowel height, gender, accentedness, and phonation types. The latter is the center piece of the paper. This correlation is needed for a more reliable account of pitch contrasts in accent and tone languages. Two approaches are used in establishing this correlation. The first relies on a subharmonic equation and the second makes use of Critical band calculations. Both yield the same results. The data used to highlight these various correlations come from Peterson and Barney (1952), Hillenbrand et al. (1995), and a fresh set of F0 measurements obtained from 46 speakers of Central Minnesota (17 males and 29 females)

    Stemmefysiologi og terminologi i CCM/rytmisk sang. En studie av forskning på stemmefysiologi knyttet til sangteknikker i CCM/rytmiske sjangere, og erfaringer med å undervise ungdommer i CCM/rytmisk sang.

    Get PDF
    Mastergradsoppgave i kultur- og språkfagenes didaktikk, Høgskolen i Innlandet, 2018.English: As a singing teacher, one may find that thorough and consistent descriptions of the laryngeal physiology underlying various CCM singing techniques are hard to come by. Books, blogs and magazines concerned with such issues seldom include exhaustive accounts of how the laryngeal structures and muscles are used on a detailed level, and the applied terminology regarding various types of voice production varies greatly. The two research questions of this study focus on the terminology and laryngeal physiology of healthy and stylistically correct CCM singing techniques, and on how to teach such CCM singing techniques to adolescents, respectively. The former question is explored through a thorough document analysis of peer-reviewed articles and books on the topic. The latter is investigated through semi-structured qualitative interviews with four singing teachers sharing their experiences of teaching CCM singing techniques to 15- to 20-year-olds. The findings include descriptions of how the laryngeal musculature and other structures engage in various types of voice production, with a particular emphasis on the issue of voice registers. Due to the inconsistent use of terminology in the field, specific guidelines regarding English terms are suggested, and two Norwegian terms are proposed (randmiks and fullmiks). Aspects specifically relevant to the teaching of 15- to 20-year-olds are presented in the interview findings, which highlight issues such as basic technique and vocal health. The findings in this thesis may be employed as a basis for further development of our understanding of CCM voice production.Norsk: Som sangpedagog kan det være vanskelig å finne grundige og konsistente beskrivelser av stemmefysiologien som er knyttet til ulike sangteknikker i CCM/rytmiske sjangere. Bøker, blogger og blader som tar for seg disse temaene inneholder sjelden uttømmende og detaljerte redegjørelser for hvordan ulike strukturer og muskler i strupen fungerer, og terminologien tilknyttet ulike typer stemmebruk varierer. Studiens to forskningsspørsmål fokuserer henholdsvis på terminologi og stemmefysiologi knyttet til sunn, hensiktsmessig og stilistisk korrekt stemmebruk i CCM/rytmisk sang, samt hvordan undervise ungdommer i rytmiske sjangere. Det første forskningsspørsmålet utforskes ved en omfattende dokumentanalyse av fagfellevurderte artikler og bøker skrevet om emnet, og det andre spørsmålet besvares gjennom en intervjuundersøkelse der fire sanglærere som har erfaring fra å undervise ungdommer i alderen 15 til 20 år deltar. Funnene i denne studien viser hvordan strupehodets ulike bestanddeler fungerer i forskjellige typer stemmebruk, og det er lagt vekt på å beskrive forhold som har relevans for forståelsen av stemmens ulike registre. Med utgangspunkt i den inkonsistente bruken av terminologien på området, foreslås det konkrete retningslinjer for hva som bør ligge til grunn for eventuelle valg når det gjelder den engelske terminologien, samt inkorporering av to norske begreper, randmiks og fullmiks. Når det gjelder undervisning av 15 til 20-åringer i CCM/rytmiske sjangere vektlegges temaer som grunnleggende sangteknikk og stemmehelse. Funnene i denne masteroppgaven kan brukes som basis i videreutviklingen av en forståelse for særegenhetene ved CCM/rytmisk stemmeproduksjon

    Vocal imitation for query by vocalisation

    Get PDF
    PhD ThesisThe human voice presents a rich and powerful medium for expressing sonic ideas such as musical sounds. This capability extends beyond the sounds used in speech, evidenced for example in the art form of beatboxing, and recent studies highlighting the utility of vocal imitation for communicating sonic concepts. Meanwhile, the advance of digital audio has resulted in huge libraries of sounds at the disposal of music producers and sound designers. This presents a compelling search problem: with larger search spaces, the task of navigating sound libraries has become increasingly difficult. The versatility and expressive nature of the voice provides a seemingly ideal medium for querying sound libraries, raising the question of how well humans are able to vocally imitate musical sounds, and how we might use the voice as a tool for search. In this thesis we address these questions by investigating the ability of musicians to vocalise synthesised and percussive sounds, and evaluate the suitability of different audio features for predicting the perceptual similarity between vocal imitations and imitated sounds. In the first experiment, musicians were tasked with imitating synthesised sounds with one or two time–varying feature envelopes applied. The results show that participants were able to imitate pitch, loudness, and spectral centroid features accurately, and that imitation accuracy was generally preserved when the imitated stimuli combined two, non-necessarily congruent features. This demonstrates the viability of using the voice as a natural means of expressing time series of two features simultaneously. The second experiment consisted of two parts. In a vocal production task, musicians were asked to imitate drum sounds. Listeners were then asked to rate the similarity between the imitations and sounds from the same category (e.g. kick, snare etc.). The results show that drum sounds received the highest similarity ratings when rated against their imitations (as opposed to imitations of another sound), and overall more than half the imitated sounds were correctly identified with above chance accuracy from the imitations, although this varied considerably between drum categories. The findings from the vocal imitation experiments highlight the capacity of musicians to vocally imitate musical sounds, and some limitations of non– verbal vocal expression. Finally, we investigated the performance of different audio features as predictors of perceptual similarity between the imitations and imitated sounds from the second experiment. We show that features learned using convolutional auto–encoders outperform a number of popular heuristic features for this task, and that preservation of temporal information is more important than spectral resolution for differentiating between the vocal imitations and same–category drum sounds

    Vocal imitation for query by vocalisation

    Get PDF
    PhDThe human voice presents a rich and powerful medium for expressing sonic ideas such as musical sounds. This capability extends beyond the sounds used in speech, evidenced for example in the art form of beatboxing, and recent studies highlighting the utility of vocal imitation for communicating sonic concepts. Meanwhile, the advance of digital audio has resulted in huge libraries of sounds at the disposal of music producers and sound designers. This presents a compelling search problem: with larger search spaces, the task of navigating sound libraries has become increasingly difficult. The versatility and expressive nature of the voice provides a seemingly ideal medium for querying sound libraries, raising the question of how well humans are able to vocally imitate musical sounds, and how we might use the voice as a tool for search. In this thesis we address these questions by investigating the ability of musicians to vocalise synthesised and percussive sounds, and evaluate the suitability of different audio features for predicting the perceptual similarity between vocal imitations and imitated sounds. In the fi rst experiment, musicians were tasked with imitating synthesised sounds with one or two time{varying feature envelopes applied. The results show that participants were able to imitate pitch, loudness, and spectral centroid features accurately, and that imitation accuracy was generally preserved when the imitated stimuli combined two, non-necessarily congruent features. This demonstrates the viability of using the voice as a natural means of expressing time series of two features simultaneously. The second experiment consisted of two parts. In a vocal production task, musicians were asked to imitate drum sounds. Listeners were then asked to rate the similarity between the imitations and sounds from the same category (e.g. kick, snare etc.). The results show that drum sounds received the highest similarity ratings when rated against their imitations (as opposed to imitations of another sound), and overall more than half the imitated sounds were correctly identi ed with above chance accuracy from the imitations, although this varied considerably between drum categories. The fi ndings from the vocal imitation experiments highlight the capacity of musicians to vocally imitate musical sounds, and some limitations of non- verbal vocal expression. Finally, we investigated the performance of different audio features as predictors of perceptual similarity between the imitations and imitated sounds from the second experiment. We show that features learned using convolutional auto-encoders outperform a number of popular heuristic features for this task, and that preservation of temporal information is more important than spectral resolution for differentiating between the vocal imitations and same-category drum sounds.Engineering and Physical Sciences Research Council (EP/G03723X/1)

    Exposing the hidden vocal channel: Analysis of vocal expression

    Get PDF
    This dissertation explored perception and modeling of human vocal expression, and began by asking what people heard in expressive speech. To address this fundamental question, clips from Shakespearian soliloquy and from the Library of Congress Veterans Oral History Collection were presented to Mechanical Turk workers (10 per clip); and the workers were asked to provide 1-3 keywords describing the vocal expression in the voice. The resulting keywords described prosody, voice quality, nonverbal quality, and emotion in the voice, along with the conversational style, and personal qualities attributed to the speaker. More than half of the keywords described emotion, and were wide-ranging and nuanced. In contrast, keywords describing prosody and voice quality reduced to a short list of frequently-repeating vocal elements. Given this description of perceived vocal expression, a 3-step process was used to model vocal qualities which listeners most frequently perceived. This process included 1) an interactive analysis across each condition to discover its distinguishing characteristics, 2) feature selection and evaluation via unequal variance sensitivity measurements and examination of means and 2-sigma variances across conditions, and 3) iterative, incremental classifier training and validation. The resulting models performed at 2-3.5 times chance. More importantly, the analysis revealed a continuum relationship across whispering, breathiness, modal speech, and resonance, and revealed multiple spectral sub-types of breathiness, modal speech, resonance, and creaky voice. Finally, latent semantic analysis (LSA) applied to the crowdsourced keyword descriptors enabled organic discovery of expressive dimensions present in each corpus, and revealed relationships among perceived voice qualities and emotions within each dimension and across the corpora. The resulting dimensional classifiers performed at up to 3 times chance, and a second study presented a dimensional analysis of laughter. This research produced a new way of exploring emotion in the voice, and of examining relationships among emotion, prosody, voice quality, conversation quality, personal quality, and other expressive vocal elements. For future work, this perception-grounded fusion of crowdsourcing and LSA technique can be applied to anything humans can describe, in any research domain

    Sociofonetická studie substituční glotalizace u rodilých mluvčích angličtiny

    Get PDF
    Glotální ráz, výslovnostní prvek, který býval v britské angličtině zatížen silným sociálním stigmatem, je nyní hojně užíván mluvčími napříč všemi společenskými vrstvami a tvoří součást většiny britských dialektů. Vůdčí role v šíření této formy se připisuje ženám, a je to právě asociace s ženskou mluvou, která ji povyšuje na sociálně prestižnější. Cílem této bakalářské práce je prozkoumat výskyt substituční glotalizace u rodilých mluvčích angličtiny, zejména v závislosti na vybraných sociolingvistických faktorech: pohlaví, věk a typ projevu. V teoretické rovině nabízí práce popis lingvistických a sociálních aspektů T-glotalizace. Zvláštní důraz je kladen na chování sociálních vlivů během jazykových změn. Prezentován je také přehled dosavadního výzkumu v této oblasti. Empirická část práce je založena na analýze 32 nahrávek rodilých mluvčích angličtiny. Výsledky studie potvrzují, že pohlaví, věk i typ projevu mají značný vliv na výskyt glotalizace. Skutečnost, že ženy z našeho vzorku iniciují její šíření, může naznačovat, že glotální ráz stále prochází jazykovou změnu.The glottal stop, previously labelled as a heavily stigmatized feature of British English pronunciation, has become widely spread across all social classes and the majority of British dialects. Young females are believed to be instrumental in leading the spread and causing the social re-evaluation of the feature. The aim of the present study is to analyze the occurrence of T-glottaling in the speech of British English speakers in relation to sociolinguistic factors, primarily age, gender and speaking style. The theoretical part provides a description of the linguistic and social aspects of T-glottaling. Particular attention is paid to the role of social factors in the process of language change. In addition, a brief overview of previous research is presented. The material for the empirical part of this study consists of 32 recordings of British English speakers. The analysis of the results reveals that gender, age and speaking style play a significant role in the frequency of occurrence of the glottal stop. Young females are shown to be the leaders of the spread of T-glottaling, which leads to the assumption that the language change is still in progress.Department of the English Language and ELT MethodologyÚstav anglického jazyka a didaktikyFilozofická fakultaFaculty of Art
    corecore