1,354 research outputs found
The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing
Work on voice sciences over recent decades has led to a proliferation of acoustic parameters that are used quite selectively and are not always extracted in a similar fashion. With many independent teams working in different research areas, shared standards become an essential safeguard to ensure compliance with state-of-the-art methods allowing appropriate comparison of results across studies and potential integration and combination of extraction and recognition systems. In this paper we propose a basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis. In contrast to a large brute-force parameter set, we present a minimalistic set of voice parameters here. These were selected based on a) their potential to index affective physiological changes in voice production, b) their proven value in former studies as well as their automatic extractability, and c) their theoretical significance. The set is intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters. Our implementation is publicly available with the openSMILE toolkit. Comparative evaluations of the proposed feature set and large baseline feature sets of INTERSPEECH challenges show a high performance of the proposed set in relation to its size
The use of spectral information in the development of novel techniques for speech-based cognitive load classification
The cognitive load of a user refers to the amount of mental demand imposed on the user when performing a particular task. Estimating the cognitive load (CL) level of the users is necessary to adjust the workload imposed on them accordingly in order to improve task performance. The current speech based CL classification systems are not adequate for commercial use due to their low performance particularly in noisy environments. This thesis proposes many techniques to improve the performance of the speech based cognitive load classification system in both clean and noisy conditions.
This thesis analyses and presents the effectiveness of speech features such as spectral centroid frequency (SCF) and spectral centroid amplitude (SCA) for CL classification. Sub-systems based on SCF and SCA features were developed and fused with the traditional Mel frequency cepstral coefficients (MFCC) based system, producing an 8.9% and 31.5% relative error rate reduction respectively when compared to the MFCC-based system alone. The Stroop test corpus was used in these experiments.
The investigation into cognitive load information in the form of spectral distribution in different subbands shows that the information distributed in the low frequency subband is significantly higher than the high frequency subband. Two different methods are proposed to utilize this finding. The first method, called the multi-band approach, uses a weighting scheme to emphasize the speech features in low frequency subbands. The cognitive load classification accuracy of this approach is shown to be higher than a system based on a non-weighting scheme. The second method is to design an effective filterbank based on the spectral distribution of cognitive load information using the Kullback-Leibler distance measure. It is shown that the designed filterbank consistently provides higher classification accuracies than other existing filterbanks such as mel, Bark, and equivalent rectangular bandwidth.
A discrete cosine transform based speech enhancement technique is proposed in order to increase the robustness of the CL classification system and found to be more suitable than other methods investigated. This proposed method provides a 3.0% average relative error rate reduction for the seven types of noise and five levels of SNR used. In particular, it provides a maximum of 7.5% relative error rate reduction for the F16 noise (in NOISEX-92 database) at 20 dB SNR
The acquisition of phonology and the classification of speech disorders in German-speaking children
PhD ThesisPhonological acquisition has been a major research topic for the past three decades.
Several different theoretical concepts, accounting for the course of phonological
acquisition, have emerged. While all these theories agree the need to explain
language-specific differences during the course of development, they all also strongly
argue for a universal pattern. This thesis aims to provide evidence for phonological
theory in a cross-linguistic context by examining monolingual children acquiring
German as their native language. A cross-sectional study of 177 normally developing
children aged 1;6 to 5; 11 was found to generally support the concept of universality
but also showed significant acquisition differences especially in comparison with
English, a closely related language. It will be argued that to date only the concept of
phonological saliency (So & Dodd, 1994; Zua Hua & Dodd, 2000) is able to fully
explain language-specific findings.
However, evidence for phonological theory cannot only be validated by using data
from developmental cross-linguistic studies but also from data describing
phonologically disordered children. The nature of the errors made and also the
children's developmental history might provide information concerning the
prerequisites for normal speech development and the cognitive processes involved in
speech perception and production. ... This thesis will argue that developmental speech disorders of unknown origin follow a
language-independent course that is constrained by a universal pattern. On the basis
of normative data for any language investigated, it should be possible to detect
universal subgroups of speech disorders across languages. The clinical implication of
this conclusion is that therapy techniques can be applied cross-linguistically.Economic and Social Research Council
Models and Analysis of Vocal Emissions for Biomedical Applications
The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the newborn to the adult and elderly. Over the years the initial issues have grown and spread also in other fields of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years in Firenze, Italy. This edition celebrates twenty-two years of uninterrupted and successful research in the field of voice analysis
Models and Analysis of Vocal Emissions for Biomedical Applications
The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy. This edition celebrates twenty years of uninterrupted and succesfully research in the field of voice analysis
Pan European Voice Conference - PEVOC 11
The Pan European VOice Conference (PEVOC) was born in 1995 and therefore in 2015 it celebrates the 20th anniversary of its establishment: an important milestone that clearly expresses the strength and interest of the scientific community for the topics of this conference. The most significant themes of PEVOC are singing pedagogy and art, but also occupational voice disorders, neurology, rehabilitation, image and video analysis. PEVOC takes place in different European cities every two years (www.pevoc.org). The PEVOC 11 conference includes a symposium of the Collegium Medicorum Theatri (www.comet collegium.com
Cepstral peak prominence: a comprehensive analysis
An analytical study of cepstral peak prominence (CPP) is presented, intended to provide an insight into its meaning and relation with voice perturbation parameters. To carry out this analysis, a parametric approach is adopted in which voice production is modelled using the traditional source-filter model and the first cepstral peak is assumed to have Gaussian shape. It is concluded that the meaning of CPP is very similar to that of the first rahmonic and some insights are provided on its dependence with fundamental frequency and vocal tract resonances. It is further shown that CPP integrates measures of voice waveform and periodicity perturbations, be them either amplitude, frequency or noise
- …