8 research outputs found

    Speaker Normalization Using Cortical Strip Maps: A Neural Model for Steady State vowel Categorization

    Full text link
    Auditory signals of speech are speaker-dependent, but representations of language meaning are speaker-independent. The transformation from speaker-dependent to speaker-independent language representations enables speech to be learned and understood from different speakers. A neural model is presented that performs speaker normalization to generate a pitch-independent representation of speech sounds, while also preserving information about speaker identity. This speaker-invariant representation is categorized into unitized speech items, which input to sequential working memories whose distributed patterns can be categorized, or chunked, into syllable and word representations. The proposed model fits into an emerging model of auditory streaming and speech categorization. The auditory streaming and speaker normalization parts of the model both use multiple strip representations and asymmetric competitive circuits, thereby suggesting that these two circuits arose from similar neural designs. The normalized speech items are rapidly categorized and stably remembered by Adaptive Resonance Theory circuits. Simulations use synthesized steady-state vowels from the Peterson and Barney [J. Acoust. Soc. Am. 24, 175-184 (1952)] vowel database and achieve accuracy rates similar to those achieved by human listeners. These results are compared to behavioral data and other speaker normalization models.National Science Foundation (SBE-0354378); Office of Naval Research (N00014-01-1-0624

    Improved Malay vowel feature extraction method based on first and second formants

    Get PDF
    There are many speech recognition applications that use vowels phonemes.Among them are speech therapy systems that improve utterances of word pronunciation especially to children.There are also systems that teach hearing impaired person to speak properly by pronouncing words with a good degree of intelligibility.All of these systems require high degree of vowel recognition capability.This paper presents a new method of Malay vowel feature extraction based on formant and spectrum envelope called First Formant Bandwidth (F1BW).It is an effort to increase Malay vowel recognition capability by using a new speech database that consist of words uttered by Malaysian speakers from the three major races, Malay, Chinese and Indians.Based on single frame analysis, F1BW performs better than MFCC by more than 9% based on four classifiers of Levenberg-Marquart trained Neural Network, K-Nearest Neighbours, Multinomial Logistic Regression and Linear Discriminant Analysis

    Noise robustness of first formant bandwidth (F1BW) features in Malay vowel recognition

    Get PDF
    Applications that use vowel phonemes require a high degree of vowel recognition capability.The performance of speech recognition application under adverse noisy conditions often becomes the topic of interest among speech recognition researchers regardless of the languages in use. In Malaysia, there are an increasing number of speech recognition researchers focusing on developing independent speaker speech recognition systems that use the Malay language which is noise robust and accurate.This paper present a study of noise robust capability of an improved vowel feature extraction method called First Formant Bandwidth (F1BW).The features are extracted from both original data and noise-added data and classified using three classifiers; (i) Multinomial Logistic Regression (MLR), (ii) K-Nearest Neighbors (K-NN) and Linear Discriminant Analysis (LDA).The results show that the proposed F1BW is robust towards noise and LDA performs the best in overall vowel classification compared to MLR and K-NN in terms of robustness capability, especially with signal-to-noise (SNR) above 20dB

    An improved feature extraction method for Malay vowel recognition based on spectrum delta

    Get PDF
    Malay speech recognition is becoming popular among Malaysian researchers. In Malaysia, more local researchers are focusing on noise robust and accurate independent speaker speech recognition systems that use Malay language.The performance of speech recognition application under adverse noisy condition often becomes the topic of interest among speech recognition researchers in any languages.This paper presents a study of noise robust capability of an improved vowel feature extraction method called Spectrum Delta (SpD).The features are extracted from both original data and noise-added data and classified using three classifiers; (i) Linear Discriminant Analysis (LDA), (ii) K-Nearest Neighbors (k-NN) and (iii) Multinomial Logistic Regression (MLR). Most of the dependent and independent speaker systems which use mostly multi-framed analysis, yielded accuracy between 89% to 100% for dependent speaker system and between 70% to 94% for an independent speaker. This study shows that SpD features obtained an accuracy of 92.42% to 95.11% using all the four classifiers on a single framed analysis which makes this result comparable to those analysed with multi-framed approach

    Reconhecimento de orador em dois segundos

    Get PDF
    Tese de mestrado integrado. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 201
    corecore