875 research outputs found

    Age Estimation in Foreign-accented Speech by Native and Non-native Speakers

    Get PDF
    Current research shows that listeners are generally accurate at estimating speakers’ age from their speech. This study investigates the effect of speaker first language and the role played by such speaker characteristics as fundamental frequency and speech rate. In this study English and Japanese first language speakers listened to English- and Japanese-accented English speech and estimated the speaker’s age. We find the highest correlation between real and estimated speaker age for English listeners listening to English speakers, followed by Japanese listeners listening to both English and Japanese speakers, with English listeners listening to Japanese speakers coming last. We find that Japanese speakers are estimated to be younger than the English speakers by English listeners, and that both groups of listeners estimate male speakers and speakers with a lower mean fundamental frequency to be older. These results suggest that listeners rely on sociolinguistic information in their speaker age estimations and language familiarity plays a role in their success

    Oral Argument in the Time of COVID: The Chief Justice Plays Calvinball

    Get PDF
    In this Article, we empirically assess the Supreme Court’s experiment in hearing telephonic oral arguments. We compare the telephonic hearings to those heard in person by the current Court and examine whether the Justices followed norms of fairness and equality. We show that the telephonic forum changed the dynamics of oral argument in a way that gave the Chief Justice new power, and that Chief Justice Roberts, knowingly or unknowingly, used that new power to benefit his ideological allies. We also show that the Chief interrupted the female Justices disproportionately more than the male Justices and gave the male Justices more substantive opportunities to have their questions answered. This analysis transcends the significance of individual cases. The fact that the Court experimented with telephonic oral argument, the way it did so, and how the practice could be improved are all issues of profound national importance. The new format had the potential to influence the outcome of cases that have broad national significance, to shift norms of equality and transparency in the Court, and more generally to affect judicial legitimacy. If the Court favors certain parties or certain ideological camps by its choice of forum in a time of crisis, then that will undermine not only the Court’s legitimacy but also raise doubts as to whether any of our national institutions have the capacity to adapt to crises

    THE ROLE OF GLOTTAL SOURCE PARAMETERS FOR HIGH-QUALITY TRANSFORMATION OF PERCEPTUAL AGE

    Get PDF
    ABSTRACT The intuitive control of voice transformation (e.g., age/sex, emotions) is useful to extend the expressive repertoire of a voice. This paper explores the role of glottal source parameters for the control of voice transformation. First, the SVLN speech synthesizer (Separation of the Vocal-tract with the Liljencrants-fant model plus Noise) is used to represent the glottal source parameters (and thus, voice quality) during speech analysis and synthesis. Then, a simple statistical method is presented to control speech parameters during voice transformation : a GMM is used to model the speech parameters of a voice, and regressions are then used to adapt the GMMs statistics (mean and variance) to a control parameter (e.g., age/sex, emotions). A subjective experiment conducted on the control of perceptual age proves the importance of the glottal source parameters for the control of voice transformation, and shows the efficiency of the statistical model to control voice parameters while preserving a high-quality of the voice transformation

    Speaker Age Estimation by Musicians and Non-musicians

    Get PDF
    Speaker age estimation is one of the most commonly researched fields in the domain of social perception based on voice. Previous findings confirm a strong correlation between the estimated and calendar age of speakers, however, younger adult speakers are usually perceived to be older, while older speakers are thought to be younger than their actual age. Effects of listener factors, such as age and gender have also been researched. The purpose of the present study is to examine if a more sophisticated auditory mechanism, which can be attributed to music training, results in more accuracy in speaker age estimation. The present research found correlation coefficients between calendar ages and mean estimated ages comparable to those reported in the literature, and musicianship and listener gender were not proven to have a significant effect on age estimations. Linear mixed models, implemented on three age groups, revealed some marginal differences between musicians and non-musicians, implying musicians’ more accurate age estimations in some cases

    User Perceptions and Stereotypic Responses to Gender and Age of Voice Assistants

    Get PDF
    Technologies such as voiced automation can aid older adults aging in place by assisting with basic home and health tasks in daily routines. However, currently available voice assistants have a common design - they are vastly represented as young and female. Prior work has shown that humans apply stereotypes to human-computer interactions similarly to human-human interactions. When these stereotypes are activated, users may lose trust or confidence in the device or stop using it all together. The purpose of this study was to investigate if users can detect age and gender cues of voiced automation and to understand the extent to which gender, age, and reliability elicit stereotypic responses which were assessed using history-based trust. A series of health-related voice automation scenarios presented users with voice assistants varying in gender, age, and reliability. Results showed differences in age and gender perceptions across participant age groups but no differences for overall trust. A three-way interaction showed that when voiced automation reliability was low, participants rated the young female voice assistant as significantly more trustworthy than all other voice assistants. This work contributes to our understanding of how anthropomorphic characteristics like age and gender in emerging technologies can elicit varied trust responses from younger and older adults

    Age prediction by voice using deep learning

    Get PDF
    One of the main topics in artificial intelligence is the speech characterization. Moreover, it is a field of study with the minimal scope when the Catalan language is involved in. In this project, we try to perform an age classification by decades firstly in the Catalan CommonVoice Dataset and then add the Spanish Dataset and English Dataset to have more data. To reach our purpose Deep Learning techniques are used to implement the classifier. The most common backbones are used such as Resnet and VGG. Furthermore, we use an attention encoder to encode the Mel-Spectrogram features. In contrast to statistical pooling methods like average pooling, Attention Pooling layers and various Attention Mechanisms are used in all backbones to perform pooling and reduce the dimensionality of the feature vector derived from the Front-End architecture. In this study, we will compare two different models, the first with an AM-Softmax in the final layer and the other with an AM-Softmax combined with Ordinal Regression

    EVALUATION OF INTELLIGIBILITY AND SPEAKER SIMILARITY OF VOICE TRANSFORMATION

    Get PDF
    Voice transformation refers to a class of techniques that modify the voice characteristics either to conceal the identity or to mimic the voice characteristics of another speaker. Its applications include automatic dialogue replacement and voice generation for people with voice disorders. The diversity in applications makes evaluation of voice transformation a challenging task. The objective of this research is to propose a framework to evaluate intentional voice transformation techniques. Our proposed framework is based on two fundamental qualities: intelligibility and speaker similarity. Intelligibility refers to the clarity of the speech content after voice transformation and speaker similarity measures how well the modified output disguises the source speaker. We measure intelligibility with word error rates and speaker similarity with likelihood of identifying the correct speaker. The novelty of our approach is, we consider whether similarly transformed training data are available to the recognizer. We have demonstrated that this factor plays a significant role in intelligibility and speaker similarity for both human testers and automated recognizers. We thoroughly test two classes of voice transformation techniques: pitch distortion and voice conversion, using our proposed framework. We apply our results for patients with voice hypertension using video self-modeling and preliminary results are presented

    Subjective measures of household resilience to climate variability and change: insights from a nationally representative survey of Tanzania

    Get PDF
    Promoting household resilience to climate extremes has emerged as a key development priority. Yet tracking and evaluating resilience at this level remains a critical challenge. Most quantitative approaches rely on objective indicators and assessment frameworks, but these are not fully satisfactory. Much of the difficulty arises from a combination of conceptual ambiguities, challenges in selecting appropriate indicators, and in measuring the many intangible aspects that contribute to household resilience. More recently, subjective measures of resilience have been advocated in helping to overcome some of the limitations of traditional objective characterizations. However, few large-scale studies of quantitative subjective approaches to resilience measurement have been conducted. In this study, we address this gap by exploring perceived levels of household resilience to climate extremes in Tanzania and the utility of standardized subjective methods for its assessment. A nationally representative cross-sectional survey involving 1294 individuals was carried out by mobile phone in June 2015 among randomly selected adult respondents aged 18 and above. Factors that are most associated with resilience-related capacities are having had advance knowledge of a previous flood, and to a lesser extent, believing flooding to be a serious community problem. Somewhat surprisingly, though a small number of weak relationships are apparent, most socio-demographic variables do not exhibit statistically significant differences with regards to perceived resilience-related capacities. These findings may challenge traditional assumptions about what factors characterize household resilience, offering a motivation for studying both subjective and objective perspectives, and understanding better their relationship to one another. If further validated, subjective measures may offer potential as both a complement and alternative to traditional objective methods of resilience measurement, each with their own merits and limitations

    Oesophageal speech: enrichment and evaluations

    Get PDF
    167 p.After a laryngectomy (i.e. removal of the larynx) a patient can no more speak in a healthy laryngeal voice. Therefore, they need to adopt alternative methods of speaking such as oesophageal speech. In this method, speech is produced using swallowed air and the vibrations of the pharyngo-oesophageal segment, which introduces several undesired artefacts and an abnormal fundamental frequency. This makes oesophageal speech processing difficult compared to healthy speech, both auditory processing and signal processing. The aim of this thesis is to find solutions to make oesophageal speech signals easier to process, and to evaluate these solutions by exploring a wide range of evaluation metrics.First, some preliminary studies were performed to compare oesophageal speech and healthy speech. This revealed significantly lower intelligibility and higher listening effort for oesophageal speech compared to healthy speech. Intelligibility scores were comparable for familiar and non-familiar listeners of oesophageal speech. However, listeners familiar with oesophageal speech reported less effort compared to non-familiar listeners. In another experiment, oesophageal speech was reported to have more listening effort compared to healthy speech even though its intelligibility was comparable to healthy speech. On investigating neural correlates of listening effort (i.e. alpha power) using electroencephalography, a higher alpha power was observed for oesophageal speech compared to healthy speech, indicating higher listening effort. Additionally, participants with poorer cognitive abilities (i.e. working memory capacity) showed higher alpha power.Next, using several algorithms (preexisting as well as novel approaches), oesophageal speech was transformed with the aim of making it more intelligible and less effortful. The novel approach consisted of a deep neural network based voice conversion system where the source was oesophageal speech and the target was synthetic speech matched in duration with the source oesophageal speech. This helped in eliminating the source-target alignment process which is particularly prone to errors for disordered speech such as oesophageal speech. Both speaker dependent and speaker independent versions of this system were implemented. The outputs of the speaker dependent system had better short term objective intelligibility scores, automatic speech recognition performance and listener preference scores compared to unprocessed oesophageal speech. The speaker independent system had improvement in short term objective intelligibility scores but not in automatic speech recognition performance. Some other signal transformations were also performed to enhance oesophageal speech. These included removal of undesired artefacts and methods to improve fundamental frequency. Out of these methods, only removal of undesired silences had success to some degree (1.44 \% points improvement in automatic speech recognition performance), and that too only for low intelligibility oesophageal speech.Lastly, the output of these transformations were evaluated and compared with previous systems using an ensemble of evaluation metrics such as short term objective intelligibility, automatic speech recognition, subjective listening tests and neural measures obtained using electroencephalography. Results reveal that the proposed neural network based system outperformed previous systems in improving the objective intelligibility and automatic speech recognition performance of oesophageal speech. In the case of subjective evaluations, the results were mixed - some positive improvement in preference scores and no improvement in speech intelligibility and listening effort scores. Overall, the results demonstrate several possibilities and new paths to enrich oesophageal speech using modern machine learning algorithms. The outcomes would be beneficial to the disordered speech community

    Why do people who stutter attend stuttering support groups?

    Get PDF
    Dissertation (MA (Speech-Language Pathology))--University of Pretoria, 2022.Background: Stuttering support groups (SSGs) are a known, invaluable resource for people who stutter (PWS). General support groups have been well researched, however, research specifically into SSGs is only emerging. Further insight is needed to guide speech-language therapists’ (SLTs) facilitation of SSGs. Objective: This research is aimed at determining PWS’ perspectives regarding why they attend SSGs in Gauteng, South Africa. Method: Thirteen PWS, between 20-58 years old, who attend SSGs were selected purposively. Their perspectives on SSGs were obtained during semi-structured telephonic interviews and analysed thematically which yielded clinical implications. Results and Discussion: Four themes; “altered perceptions”, “increased sense of community”, “support group reciprocity” and “support group environment, participants and topics”, were identified. SSGs helped PWS accept their stutter and gain confidence. Clinical implications identified included SLTs encouraging; (1) positive perceptions through education, self-empowerment, sharing success stories, and ways to elicit positive listener reactions, (2) connections between meetings to increase the sense of community, (3) reciprocity in meetings, (4) sharing personal stories to promote learning and self-management, and (5) support, praise and education to empower and encourage PWS. SLTs can encourage equal contributions from willing participants without pressuring others. Disfluency and emotional support should be equally discussed in SSGs. Conclusion: These perspectives of PWS were used to provide recommendations to SLTs of ways to better meet the needs of PWS who attend SSGs. Recommendations included focusing discussions on fluency and emotions and sharing personal stories. Insights from PWS also helped better inform SLTs of their role within SSGs including guiding conversations and facilitating conversations that foster deeper understanding.Speech-Language Pathology and AudiologyMA (Speech-Language Pathology)Unrestricte
    • 

    corecore