3,614 research outputs found

    Improving the performance of MFCC for Persian robust speech recognition

    Get PDF
    The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to the noisy original speech signal. The pre-emphasized original  speech segmented into overlapping time frames, then it is windowed by a modified hamming window .Higher order autocorrelation coefficients are extracted. The next step is to eliminate the lower order of the autocorrelation coefficients. The consequence pass from FFT block and then power spectrum of output is calculated. A Gaussian shape filter bank is applied to the results. Logarithm and two compensator blocks form which one is mean subtraction and the other one are root block applied to the results and DCT transformation is the last step. We use MLP neural network to evaluate the performance of proposed MFCC method and to classify the results. Some speech recognition experiments for various tasks indicate that the proposed algorithm is more robust than traditional ones in noisy condition

    Speaker recognition using frequency filtered spectral energies

    Get PDF
    The spectral parameters that result from filtering the frequency sequence of log mel-scaled filter-bank energies with a simple first or second order FIR filter have proved to be an efficient speech representation in terms of both speech recognition rate and computational load. Recently, the authors have shown that this frequency filtering can approximately equalize the cepstrum variance enhancing the oscillations of the spectral envelope curve that are most effective for discrimination between speakers. Even better speaker identification results than using melcepstrum have been obtained on the TIMIT database, especially when white noise was added. On the other hand, the hybridization of both linear prediction and filter-bank spectral analysis using either cepstral transformation or the alternative frequency filtering has been explored for speaker verification. The combination of hybrid spectral analysis and frequency filtering, that had shown to be able to outperform the conventional techniques in clean and noisy word recognition, has yield good text-dependent speaker verification results on the new speaker-oriented telephone-line POLYCOST database.Peer ReviewedPostprint (published version

    A Framework for Bioacoustic Vocalization Analysis Using Hidden Markov Models

    Get PDF
    Using Hidden Markov Models (HMMs) as a recognition framework for automatic classification of animal vocalizations has a number of benefits, including the ability to handle duration variability through nonlinear time alignment, the ability to incorporate complex language or recognition constraints, and easy extendibility to continuous recognition and detection domains. In this work, we apply HMMs to several different species and bioacoustic tasks using generalized spectral features that can be easily adjusted across species and HMM network topologies suited to each task. This experimental work includes a simple call type classification task using one HMM per vocalization for repertoire analysis of Asian elephants, a language-constrained song recognition task using syllable models as base units for ortolan bunting vocalizations, and a stress stimulus differentiation task in poultry vocalizations using a non-sequential model via a one-state HMM with Gaussian mixtures. Results show strong performance across all tasks and illustrate the flexibility of the HMM framework for a variety of species, vocalization types, and analysis tasks
    • …
    corecore