726 research outputs found
Objective Gender and Age Recognition from Speech Sentences
In this work, an automatic gender and age recognizer from speech is investigated. The relevant features to gender recognition are selected from the first four formant frequencies and twelve MFCCs and feed the SVM classifier. While the relevant features to age has been used with k-NN classifier for the age recognizer model, using MATLAB as a simulation tool. A special selection of robust features is used in this work to improve the results of the gender and age classifiers based on the frequency range that the feature represents. The gender and age classification algorithms are evaluated using 114 (clean and noisy) speech samples uttered in Kurdish language. The model of two classes (adult males and adult females) gender recognition, reached 96% recognition accuracy. While for three categories classification (adult males, adult females, and children), the model achieved 94% recognition accuracy. For the age recognition model, seven groups according to their ages are categorized. The model performance after selecting the relevant features to age achieved 75.3%. For further improvement a de-noising technique is used with the noisy speech signals, followed by selecting the proper features that are affected by the de-noising process and result in 81.44% recognition accuracy
Expression of gender in the human voice: investigating the âgender codeâ
We can easily and reliably identify the gender of an unfamiliar interlocutor over
the telephone. This is because our voice is âsexually dimorphicâ: men typically speak
with a lower fundamental frequency (F0 - lower pitch) and lower vocal tract resonances
(ÎF â âdeeperâ timbre) than women. While the biological bases of these differences are
well understood, and mostly down to size differences between men and women, very
little is known about the extent to which we can play with these differences to
accentuate or de-emphasise our perceived gender, masculinity and femininity in a range
of social roles and contexts.
The general aim of this thesis is to investigate the behavioural basis of gender
expression in the human voice in both children and adults. More specifically, I
hypothesise that, on top of the biologically determined sexual dimorphism, humans use
a âgender codeâ consisting of vocal gestures (global F0 and ÎF adjustments) aimed at
altering the gender attributes conveyed by their voice. In order to test this hypothesis, I
first explore how acoustic variation of sexually dimorphic acoustic cues (F0 and ÎF)
relates to physiological differences in pre-pubertal speakers (vocal tract length) and
adult speakers (body height and salivary testosterone levels), and show that voice
gender variation cannot be solely explained by static, biologically determined
differences in vocal apparatus and body size of speakers. Subsequently, I show that both
children and adult speakers can spontaneously modify their voice gender by lowering
(raising) F0 and ÎF to masculinise (feminise) their voice, a key ability for the
hypothesised control of voice gender. Finally, I investigate the interplay between voice
gender expression and social context in relation to cultural stereotypes. I report that
listeners spontaneously integrate stereotypical information in the auditory and visual
domain to make stereotypical judgments about childrenâs gender and that adult actors
manipulate their gender expression in line with stereotypical gendered notions of
homosexuality. Overall, this corpus of data supports the existence of a âgender codeâ in
human nonverbal vocal communication. This âgender codeâ provides not only a
methodological framework with which to empirically investigate variation in voice
gender and its role in expressing gender identity, but also a unifying theoretical
structure to understand the origins of such variation from both evolutionary and social
perspectives
Automatic classification possibilities of the voices of children with dysphonia
Dysphonia is a common complaint, almost every fourth child produces a pathological voice. A mobile based filtering system, that can be used by pre-school workers in order to recognize dysphonic voiced children in order to get professional help as soon as possible, would be desired. The goal of this research is to identify acoustic parameters that are able to distinguish healthy voices of children from those with dysphonia voices of children. In addition, the possibility of automatic classification is children. In addition, the possibility of automatic classification is examined. Two sample T-tests were used for statistical significance testing for the mean values of the acoustic parameters between healthy voices and those with dysphonia. A two-class classification was performed between the two groups using leave-one-out cross validation, with support vector machine (SVM) classifier. Formant frequencies, mel-frequency cepstral coefficients (MFCCs), Harmonics-to-Noise Ratio (HNR), Soft Phonation Index (SPI) and frequency band energy ratios, based on intrinsic mode functions measured on different variations of phonemes showed statistical difference between the groups. A high classification accuracy of 93% was achieved by SVM with linear and rbf kernel using only 8 acoustic parameters. Additional data is needed to build a more general model, but this research can be a reference point in the classification of voices using continuous speech between healthy children and children with dysphonia
Auditory-motor adaptation is reduced in adults who stutter but not in children who stutter
Previous studies have shown that adults who stutter
produce smaller corrective motor responses
to compensate for unexpected auditory perturbations in comparison to adults who do not stutter, suggesting that stuttering may be associated with deficits in integration of auditory feedback for
online speech monitoring. In this study, we examined whether stuttering is also associated with
deficiencies in integrating and using discrepancies between expect
ed and received auditory
feedback to adaptively update motor programs for accurate speech production.
Using a sensorimotor adaptation paradigm, we measured adaptive speech responses to auditory formant frequency perturbations in adults and children who stutter and their matched nonstuttering
controls.
We found that the magnitude of the speech adaptive response for children who stutter
did not differ from that of fluent children. However, the adaptation magnitude of adults who
stutter in response to formant
perturbation was significantly smaller than the adaptation
magnitude of adults who do not stutter. Together these results indicate that stuttering is
associated with deficits in integrating discrepancies between predicted and received auditory feedback to calibrate the speech production system in adults but not children. This auditory-motor integration deficit thus appears to be a compensatory effect that develops over years of stuttering
Developmental refinement of cortical systems for speech and voice processing
Development typically leads to optimized and adaptive neural mechanisms for the processing of voice and speech. In this fMRI study we investigated how this adaptive processing reaches its mature efficiency by examining the effects of task, age and phonological skills on cortical responses to voice and speech in children (8-9years), adolescents (14-15years) and adults. Participants listened to vowels (/a/, /i/, /u/) spoken by different speakers (boy, girl, man) and performed delayed-match-to-sample tasks on vowel and speaker identity. Across age groups, similar behavioral accuracy and comparable sound evoked auditory cortical fMRI responses were observed. Analysis of task-related modulations indicated a developmental enhancement of responses in the (right) superior temporal cortex during the processing of speaker information. This effect was most evident through an analysis based on individually determined voice sensitive regions. Analysis of age effects indicated that the recruitment of regions in the temporal-parietal cortex and posterior cingulate/cingulate gyrus decreased with development. Beyond age-related changes, the strength of speech-evoked activity in left posterior and right middle superior temporal regions significantly scaled with individual differences in phonological skills. Together, these findings suggest a prolonged development of the cortical functional network for speech and voice processing. This development includes a progressive refinement of the neural mechanisms for the selection and analysis of auditory information relevant to the ongoing behavioral task
Peer audience effects on childrenâs vocal masculinity and femininity
Existing evidence suggests that children from around the age of 8 years strategically alter their public image in accordance with known values and preferences of peers, through the self-descriptive information they convey. However, an important but neglected aspect of this âself-presentationâ is the medium through which such information is communicated: the voice itself. The present study explored peer audience effects on children's vocal productions. Fifty-six children (26 females, aged 8â10 years) were presented with vignettes where a fictional child, matched to the participant's age and sex, is trying to make friends with a group of same-sex peers with stereotypically masculine or feminine interests (rugby and ballet, respectively). Participants were asked to impersonate the child in that situation and, as the child, to read out loud masculine, feminine and gender-neutral self-descriptive statements to these hypothetical audiences. They also had to decide which of those self-descriptive statements would be most helpful for making friends. In line with previous research, boys and girls preferentially selected masculine or feminine self-descriptive statements depending on the audience interests. Crucially, acoustic analyses of fundamental frequency and formant frequency spacing revealed that children also spontaneously altered their vocal productions: they feminized their voices when speaking to members of the ballet club, while they masculinized their voices when speaking to members of the rugby club. Both sexes also feminized their voices when uttering feminine sentences, compared to when uttering masculine and gender-neutral sentences. Implications for the hitherto neglected role of acoustic qualities of children's vocal behaviour in peer interactions are discussed
The effect of telepractice on vocal interaction between provider, deaf and hard-of-hearing pediatric patients, and caregivers.
The purpose of this thesis is to examine how telepractice affects a vocal interaction between a speech-language pathologist (SLP), deaf and hard-of-hearing children who received cochlear implants (n = 7), and caregivers as they engage in speech-language interventions conducted in-person and via telepractice (tele). Frequency of vocalizations, vocal turns, pause duration, fundamental frequency (F0) mean and range, utterance duration, syllable rate per utterance duration, and mean length of utterance (MLU) were examined. The SLP vocalized more during in-person than tele-sessions, opposite result for the mother. There were more SLP-child turns during in-person sessions than tele-sessions; opposite result for mother-child turns. Pauses were longer in SLP-child, mother-child turns during tele than in-person sessions. The SLP increased mean F0, SLP and child expanded F0 range in tele-sessions. The mother had longer utterance duration, higher MLU during in-person than tele-sessions. Results suggest vocal interactions between provider, patient, and caregiver are impacted by intervention service modality
Recommended from our members
Environment- and listener-oriented speaking style adaptations across the lifespan
textThis dissertation examines how age affects the ability to produce intelligibility- enhancing speaking style adaptations in response to environment-related difficulties (noise-adapted speech) and in response to listenersâ perceptual difficulties (clear speech). Materials consisted of conversational and clear speech sentences produced in quiet and in response to noise by children (11-13 years), young adults (18-29 years), and older adults (60-84 years). Acoustic measures of global, segmental, and voice characteristics were obtained. Young adult listeners participated in word-recognition-in-noise and perceived age tasks. The study also examined relative talker intelligibility as well as the relationship between the acoustic measurements and intelligibility results. Several age-related differences in speaking style adaptation strategies were found. Children increased mean F0 and F1 more than adults in response to noise, and exhibited greater changes to voice quality when producing clear speech (increased HNR, decreased shimmer). Older adults lengthened pause duration more in clear speech compared to younger talkers. Word recognition in noise results revealed no age-related differences in the intelligibility of conversational speech. Noise-adapted and clear speech modifications increased intelligibility for all talker groups. However, the acoustic changes implemented by children when producing noise-adapted and clear speech were less efficient in enhancing intelligibility compared to the young adult talkers. Children were also less intelligible than older adults for speech produced in quiet. Results confirmed that the talkers formed 3 perceptually-distinct age groups. Correlation analyses revealed that relative talker intelligibility was consistent for conversational and clear speech in quiet. However, relative talker intelligibility was found to be more variable with the inclusion of additional speaking style adaptations. 1-3 kHz energy, speaking rate, vowel and pause durations all emerged as significant acoustic-phonetic predictors of intelligibility. This is the first study to investigate how clear speech and noise-adapted speech benefits interact with each other across multiple talker groups. The findings enhance our understanding of intelligibility variation across the lifespan and have implications for a number of applied realms, from audiologic rehabilitation to speech synthesis.Linguistic
Recommended from our members
Children's Perception of Conversational and Clear American-English Vowels in Noise
A handful of studies have examined children's perception of clear speech in the presence of background noise. Although accurate vowel perception is important for listeners' comprehension, no study has focused on whether vowels uttered in clear speech aid intelligibility for children listeners. In the present study, American-English (AE) speaking children repeated the AE vowels /Δ, ĂŠ, É, Ê/ in the nonsense word /gÉbVpÉ/ in phrases produced in conversational and clear speech by two female AE-speaking adults. The recordings of the adults' speech were presented at a signal-to-noise ratio (SNR) of -6 dB to 15 AE-speaking children (ages 5.0-8.5) in an examination of whether the accuracy of AE school-age children's vowel identification in noise is more accurate when utterances are produced in clear speech than in conversational speech. Effects of the particular vowel uttered and talker effects were also examined. Clear speech vowels were repeated significantly more accurately (87%) than conversational speech vowels (59%), suggesting that clear speech aids children's vowel identification. Results varied as a function of the talker and particular vowel uttered. Child listeners repeated one talker's vowels more accurately than the other's and front vowels more accurately than central and back vowels. The findings support the use of clear speech for enhancing adult-to-child communication in AE, particularly in noisy environments
- âŠ