Search CORE

1,787 research outputs found

Autocorrelation-based Methods for Noise-Robust Speech Recognition

Author: Gholamreza Farahani
Mohammad Ahadi
Mohammad Mehdi Homayounpour
Publication venue: 'IntechOpen'
Publication date: 01/06/2007
Field of study

New Features Using Robust MVDR Spectrum of Filtered Autocorrelation Sequence for Robust Speech Recognition

Author: Saeed Gazor
Sanaz Seyedin
Seyed Mohammad Ahadi
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

This paper presents a novel noise-robust feature extraction method for speech recognition using the robust perceptual minimum variance distortionless response (MVDR) spectrum of temporally filtered autocorrelation sequence. The perceptual MVDR spectrum of the filtered short-time autocorrelation sequence can reduce the effects of residue of the nonstationary additive noise which remains after filtering the autocorrelation. To achieve a more robust front-end, we also modify the robust distortionless constraint of the MVDR spectral estimation method via revised weighting of the subband power spectrum values based on the sub-band signal to noise ratios (SNRs), which adjusts it to the new proposed approach. This new function allows the components of the input signal at the frequencies least affected by noise to pass with larger weights and attenuates more effectively the noisy and undesired components. This modification results in reduction of the noise residuals of the estimated spectrum from the filtered autocorrelation sequence, thereby leading to a more robust algorithm. Our proposed method, when evaluated on Aurora 2 task for recognition purposes, outperformed all Mel frequency cepstral coefficients (MFCC) as the baseline, relative autocorrelation sequence MFCC (RAS-MFCC), and the MVDR-based features in several different noisy conditions

Crossref

Directory of Open Access Journals

Cross validation of bi-modal health-related stress assessment

Author: A Marty
A Tawari
B Arnrich
B Kedem
B Schuller
B Schölkopf
D Morrison
D Ververidis
DA Craig
DF Tolin
DM Hilty
DR Ladd
DW Aha
EB Baum
Egon L. van den Broek
EL Broek van den
EL Broek van den
EN Khalil
F Pallavicini
Frans van der Sluis
IR Murray
J Blascovich
J Krumm
J Sánchez-Meca
J Wolpe
JA Healey
K Domschke
K Nieuwenhuijsen
KR Scherer
LK Hansen
LM Blainlow
M El Ayadi
M Hall
MD Zwaag van der
MG Newman
N Rüscha
N Rüscha
P Rani
PL Bartlett
R Banse
R Cowie
R Likert
RB Fillingim
RC Kessler
RG Lyons
RW Picard
S Wu
T Shimamura
TM Cover
Ton Dijkstra
TR Kosten
Publication venue: Springer Verlag
Publication date: 01/01/2011
Field of study

This study explores the feasibility of objective and ubiquitous stress assessment. 25 post-traumatic stress disorder patients participated in a controlled storytelling (ST) study and an ecologically valid reliving (RL) study. The two studies were meant to represent an early and a late therapy session, and each consisted of a "happy" and a "stress triggering" part. Two instruments were chosen to assess the stress level of the patients at various point in time during therapy: (i) speech, used as an objective and ubiquitous stress indicator and (ii) the subjective unit of distress (SUD), a clinically validated Likert scale. In total, 13 statistical parameters were derived from each of five speech features: amplitude, zero-crossings, power, high-frequency power, and pitch. To model the emotional state of the patients, 28 parameters were selected from this set by means of a linear regression model and, subsequently, compressed into 11 principal components. The SUD and speech model were cross-validated, using 3 machine learning algorithms. Between 90% (2 SUD levels) and 39% (10 SUD levels) correct classification was achieved. The two sessions could be discriminated in 89% (for ST) and 77% (for RL) of the cases. This report fills a gap between laboratory and clinical studies, and its results emphasize the usefulness of Computer Aided Diagnostics (CAD) for mental health care

Crossref

Springer - Publisher Connector

Copenhagen University Research Information System

Radboud Repository

University of Twente Research Information

Enhancing the front-end of speaker recognition systems

Author: Ahmed Ahmed Isam
Publication venue
Publication date: 01/07/2019
Field of study

Portsmouth University Research Portal (Pure)

Speech and crosstalk detection in multichannel audio

Author: Brown G.J.
Renals S.
Wan V.
Wrigley S.N.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

The analysis of scenarios in which a number of microphones record the activity of speakers, such as in a round-table meeting, presents a number of computational challenges. For example, if each participant wears a microphone, speech from both the microphone's wearer (local speech) and from other participants (crosstalk) is received. The recorded audio can be broadly classified in four ways: local speech, crosstalk plus local speech, crosstalk alone and silence. We describe two experiments related to the automatic classification of audio into these four classes. The first experiment attempted to optimize a set of acoustic features for use with a Gaussian mixture model (GMM) classifier. A large set of potential acoustic features were considered, some of which have been employed in previous studies. The best-performing features were found to be kurtosis, "fundamentalness," and cross-correlation metrics. The second experiment used these features to train an ergodic hidden Markov model classifier. Tests performed on a large corpus of recorded meetings show classification accuracies of up to 96%, and automatic speech recognition performance close to that obtained using ground truth segmentation

Crossref

Edinburgh Research Archive

Edinburgh Research Explorer

White Rose Research Online

Improving the performance of MFCC for Persian robust speech recognition

Author: D. Darabian
H. Marvi
M. Sharif Noughabi
Publication venue: 'International Digital Organization for Scientific Information (IDOSI)'
Publication date: 01/10/2015
Field of study

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to the noisy original speech signal. The pre-emphasized original speech segmented into overlapping time frames, then it is windowed by a modified hamming window .Higher order autocorrelation coefficients are extracted. The next step is to eliminate the lower order of the autocorrelation coefficients. The consequence pass from FFT block and then power spectrum of output is calculated. A Gaussian shape filter bank is applied to the results. Logarithm and two compensator blocks form which one is mean subtraction and the other one are root block applied to the results and DCT transformation is the last step. We use MLP neural network to evaluate the performance of proposed MFCC method and to classify the results. Some speech recognition experiments for various tasks indicate that the proposed algorithm is more robust than traditional ones in noisy condition

Directory of Open Access Journals

Investigation of Voice Pathology Detection and Classification on Different Frequency Regions Using Correlation Functions.

Author: Al-Nasheri Ahmed
Ali Zulfiqar
Alsulaiman Mansour
Muhammad Ghulam
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

Automatic voice pathology detection and classification systems effectively contribute to the assessment of voice disorders, which helps clinicians to detect the existence of any voice pathologies and the type of pathology from which patients suffer in the early stages. This work concentrates on developing an accurate and robust feature extraction for detecting and classifying voice pathologies by investigating different frequency bands using correlation functions. In this paper, we extracted maximum peak values and their corresponding lag values from each frame of a voiced signal by using correlation functions as features to detect and classify pathological samples. These features are investigated in different frequency bands to see the contribution of each band on the detection and classification processes.Various samples of sustained vowel /a/ of normal and pathological voices were extracted from three different databases: English, German, and Arabic. A support vector machine was used as a classifier. We also performed a t test to investigate the significant differences in mean of normal and pathological samples.The best achieved accuracies in both detection and classification were varied depending on the band, the correlation function, and the database. The most contributive bands in both detection and classification were between 1000 and 8000 Hz. In detection, the highest acquired accuracies when using cross-correlation were 99.809%, 90.979%, and 91.168% in the Massachusetts Eye and Ear Infirmary, Saarbruecken Voice Database, and Arabic Voice Pathology Database databases, respectively. However, in classification, the highest acquired accuracies when using cross-correlation were 99.255%, 98.941%, and 95.188% in the three databases, respectively

University of Essex Research Repository

Ulster University's Research Portal