Search CORE

9,377 research outputs found

Stress and Emotion Classification Using Jitter and Shimmer Features

Author: Johnson Michael T.
Leong Kirsten
Li Xi
Newman John D.
Savage Anne
Soltis Joseph
Tao Jidong
Publication venue: e-Publications@Marquette
Publication date: 01/01/2007
Field of study

In this paper, we evaluate the use of appended jitter and shimmer speech features for the classification of human speaking styles and of animal vocalization arousal levels. Jitter and shimmer features are extracted from the fundamental frequency contour and added to baseline spectral features, specifically Mel-frequency cepstral coefficients (MFCCs) for human speech and Greenwood function cepstral coefficients (GFCCs) for animal vocalizations. Hidden Markov models (HMMs) with Gaussian mixture models (GMMs) state distributions are used for classification. The appended jitter and shimmer features result in an increase in classification accuracy for several illustrative datasets, including the SUSAS dataset for human speaking styles as well as vocalizations labeled by arousal level for African elephant and Rhesus monkey species

epublications@Marquette

CiteSeerX

Crossref

Optimal Representation of Anuran Call Spectrum in Environmental Monitoring Systems Using Wireless Sensor Networks

Author: Aguayo-González Francisco (Coordinador)
Barbancho Concejero Julio
Carrasco Muñoz Alejandro
Gómez-Bellido Jesús
León de Mora Carlos (Coordinador)
Luque Sendra Amalia
Publication venue: 'MDPI AG'
Publication date: 01/01/2018
Field of study

The analysis and classiﬁcation of the sounds produced by certain animal species, notably anurans, have revealed these amphibians to be a potentially strong indicator of temperature ﬂuctuations and therefore of the existence of climate change. Environmental monitoring systems using Wireless Sensor Networks are therefore of interest to obtain indicators of global warming. For the automatic classiﬁcation of the sounds recorded on such systems, the proper representation of the sound spectrum is essential since it contains the information required for cataloguing anuran calls. The present paper focuses on this process of feature extraction by exploring three alternatives: the standardized MPEG-7, the Filter Bank Energy (FBE), and the Mel Frequency Cepstral Coefﬁcients (MFCC). Moreover, various values for every option in the extraction of spectrum features have been considered. Throughout the paper, it is shown that representing the frame spectrum with pure FBE offers slightly worse results than using the MPEG-7 features. This performance can easily be increased, however, by rescaling the FBE in a double dimension: vertically, by taking the logarithm of the energies; and, horizontally, by applying mel scaling in the ﬁlter banks. On the other hand, representing the spectrum in the cepstral domain, as in MFCC, has shown additional marginal improvements in classiﬁcation performance.University of Seville: Telefónica Chair "Intelligence Networks

idUS. Depósito de Investigación Universidad de Sevilla

Mel cepstral coefficient modification based on the Glimpse Proportion measure for improving the intelligibility of HMM-generated synthetic speech in noise

Author: King S.
Valentini-Botinhao C.
Yamagishi J.
Publication venue
Publication date: 01/09/2012
Field of study

Edinburgh Research Explorer

Speaker recognition using frequency filtered spectral energies

Author: Hernando Pericás Francisco Javier
Publication venue: FONDAZIONE UGO BORDONI
Publication date: 01/01/1999
Field of study

The spectral parameters that result from filtering the frequency sequence of log mel-scaled filter-bank energies with a simple first or second order FIR filter have proved to be an efficient speech representation in terms of both speech recognition rate and computational load. Recently, the authors have shown that this frequency filtering can approximately equalize the cepstrum variance enhancing the oscillations of the spectral envelope curve that are most effective for discrimination between speakers. Even better speaker identification results than using melcepstrum have been obtained on the TIMIT database, especially when white noise was added. On the other hand, the hybridization of both linear prediction and filter-bank spectral analysis using either cepstral transformation or the alternative frequency filtering has been explored for speaker verification. The combination of hybrid spectral analysis and frequency filtering, that had shown to be able to outperform the conventional techniques in clean and noisy word recognition, has yield good text-dependent speaker verification results on the new speaker-oriented telephone-line POLYCOST database.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC