105 research outputs found

    A minimax search algorithm for robust continuous speech recognition

    Get PDF
    In this paper, we propose a novel implementation of a minimax decision rule for continuous density hidden Markov-model-based robust speech recognition. By combining the idea of the minimax decision rule with a normal Viterbi search, we derive a recursive minimax search algorithm, where the minimax decision rule is repetitively applied to determine the partial paths during the search procedure. Because of the intrinsic nature of a recursive search, the proposed method can be easily extended to perform continuous speech recognition. Experimental results on Japanese isolated digits and TIDIGITS, where the mismatch between training and testing conditions is caused by additive white Gaussian noise, show the viability and efficiency of the proposed minimax search algorithm.published_or_final_versio

    Study of noise robustness of First Formant Bandwidth (F1BW) method

    Get PDF
    The performance of speech recognition application under adverse noisy condition often becomes the topic of researchers regardless of the language used. Applications that use vowel phonemes require high degree of Standard Malay vowel recognition capability.In Malaysia, researches in vowel recognition is still lacking especially in the usage of Malay vowels, independent speaker systems, recognition robustness and algorithm speed and accuracy. This paper presents a noise robustness study on an improved vowel feature extraction method called First Formant Bandwidth (F1BW) on three classifiers of Multinomial Logistic Regression (MLR), K-Nearest Neighbors (k-NN) and Linear Discriminant Analysis (LDA).Results show that LDA performs best in overall vowel classification compared to MLR and KNN in terms of robustness capability

    Parallel model combination and word recognition in soccer audio

    Get PDF
    The audio scene from broadcast soccer can be used for identifying highlights from the game. Audio cues derived from these sources provide valuable information about game events, as can the detection of key words used by the commentators. In this paper we interpret the feasibility of incorporating both commentator word recognition and information about the additive background noise in an HMM structure. A limited set of audio cues, which have been extracted from data collected from the 2006 FIFA World Cup, are used to create an extension to the Aurora-2 database. The new database is then tested with various PMC models and compared to the standard baseline, clean and multi-condition training methods. It is found that incorporating SNR and noise type information into the PMC process is beneficial to recognition performance

    PLASER: Pronunciation Learning via Automatic Speech Recognition

    Get PDF
    PLASER is a multimedia tool with instant feedback designed to teach English pronunciation for high-school students of Hong Kong whose mother tongue is Cantonese Chinese. The objective is to teach correct pronunciation and not to assess a student's overall pronunciation quality. Major challenges related to speech recognition technology include: allowance for non-native accent, reliable and corrective feedbacks, and visualization of errors
    • …
    corecore