Search CORE

1,354 research outputs found

Reactions of adult listeners to infant speech-like vocalizations and cry

Author: Yoo Hyunjoo
Publication venue: University of Memphis Digital Commons
Publication date: 01/01/2018
Field of study

The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing

Author: André Elisabeth
Busso Carlos
Devillers Laurence
Epps Julien
Eyben Florian
Laukka Petri
Narayanan Shrikanth
Scherer Klaus
Schuller Björn
Sundberg Johan
Truong Khiet
Publication venue: IEEE
Publication date: 01/04/2016
Field of study

Work on voice sciences over recent decades has led to a proliferation of acoustic parameters that are used quite selectively and are not always extracted in a similar fashion. With many independent teams working in different research areas, shared standards become an essential safeguard to ensure compliance with state-of-the-art methods allowing appropriate comparison of results across studies and potential integration and combination of extraction and recognition systems. In this paper we propose a basic standard acoustic parameter set for various areas of automatic voice analysis, such as paralinguistic or clinical speech analysis. In contrast to a large brute-force parameter set, we present a minimalistic set of voice parameters here. These were selected based on a) their potential to index affective physiological changes in voice production, b) their proven value in former studies as well as their automatic extractability, and c) their theoretical significance. The set is intended to provide a common baseline for evaluation of future research and eliminate differences caused by varying parameter sets or even different implementations of the same parameters. Our implementation is publicly available with the openSMILE toolkit. Comparative evaluations of the proposed feature set and large baseline feature sets of INTERSPEECH challenges show a high performance of the proposed set in relation to its size

University of Twente Research Information

The use of spectral information in the development of novel techniques for speech-based cognitive load classification

Author: Le Phu Ngoc
Publication venue: UNSW, Sydney
Publication date: 01/01/2012
Field of study

The cognitive load of a user refers to the amount of mental demand imposed on the user when performing a particular task. Estimating the cognitive load (CL) level of the users is necessary to adjust the workload imposed on them accordingly in order to improve task performance. The current speech based CL classification systems are not adequate for commercial use due to their low performance particularly in noisy environments. This thesis proposes many techniques to improve the performance of the speech based cognitive load classification system in both clean and noisy conditions. This thesis analyses and presents the effectiveness of speech features such as spectral centroid frequency (SCF) and spectral centroid amplitude (SCA) for CL classification. Sub-systems based on SCF and SCA features were developed and fused with the traditional Mel frequency cepstral coefficients (MFCC) based system, producing an 8.9% and 31.5% relative error rate reduction respectively when compared to the MFCC-based system alone. The Stroop test corpus was used in these experiments. The investigation into cognitive load information in the form of spectral distribution in different subbands shows that the information distributed in the low frequency subband is significantly higher than the high frequency subband. Two different methods are proposed to utilize this finding. The first method, called the multi-band approach, uses a weighting scheme to emphasize the speech features in low frequency subbands. The cognitive load classification accuracy of this approach is shown to be higher than a system based on a non-weighting scheme. The second method is to design an effective filterbank based on the spectral distribution of cognitive load information using the Kullback-Leibler distance measure. It is shown that the designed filterbank consistently provides higher classification accuracies than other existing filterbanks such as mel, Bark, and equivalent rectangular bandwidth. A discrete cosine transform based speech enhancement technique is proposed in order to increase the robustness of the CL classification system and found to be more suitable than other methods investigated. This proposed method provides a 3.0% average relative error rate reduction for the seven types of noise and five levels of SNR used. In particular, it provides a maximum of 7.5% relative error rate reduction for the F16 noise (in NOISEX-92 database) at 20 dB SNR

UNSWorks

The acquisition of phonology and the classification of speech disorders in German-speaking children

Author: Fox Annette V
Publication venue: Newcastle University
Publication date: 01/01/2000
Field of study

PhD ThesisPhonological acquisition has been a major research topic for the past three decades. Several different theoretical concepts, accounting for the course of phonological acquisition, have emerged. While all these theories agree the need to explain language-specific differences during the course of development, they all also strongly argue for a universal pattern. This thesis aims to provide evidence for phonological theory in a cross-linguistic context by examining monolingual children acquiring German as their native language. A cross-sectional study of 177 normally developing children aged 1;6 to 5; 11 was found to generally support the concept of universality but also showed significant acquisition differences especially in comparison with English, a closely related language. It will be argued that to date only the concept of phonological saliency (So & Dodd, 1994; Zua Hua & Dodd, 2000) is able to fully explain language-specific findings. However, evidence for phonological theory cannot only be validated by using data from developmental cross-linguistic studies but also from data describing phonologically disordered children. The nature of the errors made and also the children's developmental history might provide information concerning the prerequisites for normal speech development and the cognitive processes involved in speech perception and production. ... This thesis will argue that developmental speech disorders of unknown origin follow a language-independent course that is constrained by a universal pattern. On the basis of normative data for any language investigated, it should be possible to detect universal subgroups of speech disorders across languages. The clinical implication of this conclusion is that therapy techniques can be applied cross-linguistically.Economic and Social Research Council

Newcastle University eTheses

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the newborn to the adult and elderly. Over the years the initial issues have grown and spread also in other fields of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years in Firenze, Italy. This edition celebrates twenty-two years of uninterrupted and successful research in the field of voice analysis

Directory of Open Access Books (DOAB)

Models and Analysis of Vocal Emissions for Biomedical Applications

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA) came into being in 1999 from the particularly felt need of sharing know-how, objectives and results between areas that until then seemed quite distinct such as bioengineering, medicine and singing. MAVEBA deals with all aspects concerning the study of the human voice with applications ranging from the neonate to the adult and elderly. Over the years the initial issues have grown and spread also in other aspects of research such as occupational voice disorders, neurology, rehabilitation, image and video analysis. MAVEBA takes place every two years always in Firenze, Italy. This edition celebrates twenty years of uninterrupted and succesfully research in the field of voice analysis

Directory of Open Access Books (DOAB)

Pan European Voice Conference - PEVOC 11

Author
Publication venue: 'Firenze University Press'
Publication date: 31/05/2022
Field of study

The Pan European VOice Conference (PEVOC) was born in 1995 and therefore in 2015 it celebrates the 20th anniversary of its establishment: an important milestone that clearly expresses the strength and interest of the scientific community for the topics of this conference. The most significant themes of PEVOC are singing pedagogy and art, but also occupational voice disorders, neurology, rehabilitation, image and video analysis. PEVOC takes place in different European cities every two years (www.pevoc.org). The PEVOC 11 conference includes a symposium of the Collegium Medicorum Theatri (www.comet collegium.com

Directory of Open Access Books (DOAB)

Cepstral peak prominence: a comprehensive analysis

Author: Abramowitz
Alpan
Alpan
Alpan
Awan
Awan
Awan
Awan
Awan
Balasubramanium
Balasubramanium
Blankenship
Cannito
Chen
Childers
Childers
Clapham
Dejonckere
Eadie
Esposito
Esposito
Ferrer
Fraile
Fraj
Haderlein
Haderlein
Halberstam
Hartl
Hartl
Hartl
Haykin
Heman-Ackah
Heman-Ackah
Heman-Ackah
Hillenbrand
Hillenbrand
Howard
Juan Ignacio Godino-Llorente
Kumar
Leong
Lowell
Lowell
Maryn
Maryn
Maryn
Medhurst
Mehta
Mehta
Merk
Moers
Murphy
Murphy
Murphy
Nagle
Noll
Oppenheim
Oppenheim
Peterson
Rabiner
Rosa
Rubén Fraile
Samlan
Samlan
Shanmugan
Shrivastav
Shrivastav
Shue
Solomon
Story
Vasilakis
Vipperla
Watts
Wolfe
Wolfe
Yap
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

An analytical study of cepstral peak prominence (CPP) is presented, intended to provide an insight into its meaning and relation with voice perturbation parameters. To carry out this analysis, a parametric approach is adopted in which voice production is modelled using the traditional source-filter model and the first cepstral peak is assumed to have Gaussian shape. It is concluded that the meaning of CPP is very similar to that of the first rahmonic and some insights are provided on its dependence with fundamental frequency and vocal tract resonances. It is further shown that CPP integrates measures of voice waveform and periodicity perturbations, be them either amplitude, frequency or noise

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Archivo Digital UPM