294 research outputs found

    Noise-Robust Speaker Verification Using F 0 Features

    No full text
    This paper proposes a noise-robust speaker verification method augmented by fundamental frequency (F 0 ). The paper first describes a noise-robust F0 extraction method using the Hough transform. Then, it proposes a robust speaker verification method using multi-stream HMMs which fuse the extracted F 0 and cepstral features. Experiments are conducted using fourconnected -digit utterances of Japanese by 37 male speakers recorded at five sessions over a half year period. The utterances are contaminated with white noise at various SNR levels. Experimental results show that the F0 features improve the verification performance in all SNR conditions

    Noise robust speech recognition using spectral subtraction and F0 information extracted by Hough transform

    Get PDF
    We propose a noise robust speech recognition method based on combining novel features extracted from fundamental frequency (F0) information and spectral subtraction. F0 features have been shown to be effective in speech recognition in noisy environments. Recently, F0 features obtained by Hough transform were developed for concatenated digit recognition and significantly improved recognition performance of noisy speech. This paper proposes novel features based on Hough transform for large-vocabulary continuous speech recognition. In addition, spectral subtraction is applied before Hough transform to remove static noise. The proposed method was tested using the Japanese Newspaper Article Sentences (JNAS) database. Word accuracy was improved in all noise conditions, with the best absolute improvement being 2.6 points in percentage when station noise was added at 10 dB SNR.APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. 4-7 October 2009. Sapporo, Japan. Poster session: Automatic Speech Recognition (6 October 2009)

    Prosodic Word Boundary Detection Using Statistical Modeling Of Moraic Fundamental Frequency Contours And Its Use For Continuous Speech Recognition

    No full text
    A new method for prosodic word boundary detection in continuous speech was developed based on the statistical modeling of moraic transitions of fundamental frequency (F0) contours, formerly proposed by the authors. In the developed method, F0 contours of prosodic words were modeled separately according to the accent types. An input utterance was matched against the models and was divided into constituent prosodic words. By doing so, prosodic word boundaries can be obtained. The method was first applied to the boundary detection experiments of ATR continuous speech corpus. With mora boundary locations given in the corpus, total detection rate reached 91.5 %. Then the method was integrated into a continuous speech recognition scheme with unlimited vocabulary. A few percentage improvement was observed in mora recognition for the above corpus. Although all the experiments done in closed conditions due to the corpus availability, the results indicated the usefulness of the proposed method. ..

    Audio-Visual Person Authentication Using Speech and Ear Images

    No full text
    This paper proposes a multimodal, biometric person authentication method using speech and ear images to attempt to improve the performance in mobile environments. It is well known that the performance of person authentication using only speech is deteriorated by acoustic noises and feature changes with time. Since the ear shape of each person does not change over time, integrating its image with speech information increases robustness of person authentication. Experiments are conducted using audio-visual database collected from 38 male speakers at five sessions over a half year period. Speech data are contaminated with white noise at various SNR conditions. Experimental results show that the authentication performance is improved by combining the ear image with speech in every SNR condition

    Development of a WFST based Speech Recognition System for a Resource Deficient Language Using Machine Translation

    Get PDF
    Text corpus size is an important issue when building a language model (LM) in particular where insufficient training and evaluation data are available. In this paper we continue our work on creating a speech recognition system with a LM that is trained on a small amount of text in the target language. In order to get better performance we use a large amount of foreign text and a dictionary mapping between the languages. A dictionary is used since we are assuming that the target language is resource deficient and therefore statistical machine translation (MT) is not available. In this paper we take a step forward from our previous published method by using a coupling of the speech recognition part and the translation part rather than pre-translating the foreign text. The coupling is achieved with a weighted finite state transducer (WFST) network which as well makes it possible to easily switch between the output language, i.e. that the output text is in the format of the resource deficient language or in the resource rich language. Our method outperforms the resource-deficient Icelandic speech recognition baseline, 82.6% keyword accuracy (KA), when the system is trained on 1500 Icelandic sentences, both for the English output (2.6% absolute KA improvement) and for the Icelandic output (1.6% absolute KA improvement) where the English text corpus consists of 63003 sentences.APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. 4-7 October 2009. Sapporo, Japan. Oral session: Speech and Music Processing (5 October 2009)

    Recent Development of WFST-Based Speech Recognition Decoder

    Get PDF
    In this paper we present an overview of the Tokyo Tech Transducer-based Decoder T3 (pronounced tee-cubed). There is a high level overview of the engine's design and features which is accompanied by a more detailed description of the features that are unique to our engine. These include the ability to perform acoustic computations on a graphics card and generalized fast on-the-fly composition and optimization algorithms. We describe voice activity detection functionality recently added to the engine and finally results are presented which show the engine achieving very high recognition throughput at a high recognition accuracy.APSIPA ASC 2009: Asia-Pacific Signal and Information Processing Association, 2009 Annual Summit and Conference. 4-7 October 2009. Sapporo, Japan. Oral session: Infrastructure Software for Speech Processing (5 October 2009)

    Detection of prosodic word boundaries by statistical modeling of mora transitions of fundamental frequency contours and its use for continuous speech recognition

    No full text
    We have been developing a reliable method of prosodic word boundary detection for Japanese continuous speech based on the statistical modeling of mora transitions of fundamental frequency contours of prosodic words. Modifications in the codebook sizes and in the HMM topologies improved the boundary detection performance. When using mora boundary information obtainable from the phoneme recognition process, the detection rates were reached around 73 % with 12.5 % insertion errors for speaker-open experiments. This method was then integrated to a continuous speech recognition system with un-limited vocabulary. The integrated system conducts recognition process in two stages: first stage to detect mora boundaries without prosodic information and second stage to increase mora recognition rate using prosodic word boundary information. Slight improvements in mora recognition rates were observed both in speaker-closed and-open experiments. 1
    corecore