7 research outputs found

    Embedded Knowledge-based Speech Detectors for Real-Time Recognition Tasks

    Get PDF
    Speech recognition has become common in many application domains, from dictation systems for professional practices to vocal user interfaces for people with disabilities or hands-free system control. However, so far the performance of automatic speech recognition (ASR) systems are comparable to human speech recognition (HSR) only under very strict working conditions, and in general much lower. Incorporating acoustic-phonetic knowledge into ASR design has been proven a viable approach to raise ASR accuracy. Manner of articulation attributes such as vowel, stop, fricative, approximant, nasal, and silence are examples of such knowledge. Neural networks have already been used successfully as detectors for manner of articulation attributes starting from representations of speech signal frames. In this paper, the full system implementation is described. The system has a first stage for MFCC extraction followed by a second stage implementing a sinusoidal based multi-layer perceptron for speech event classification. Implementation details over a Celoxica RC203 board are give

    Multimodal person recognition for human-vehicle interaction

    Get PDF
    Next-generation vehicles will undoubtedly feature biometric person recognition as part of an effort to improve the driving experience. Today's technology prevents such systems from operating satisfactorily under adverse conditions. A proposed framework for achieving person recognition successfully combines different biometric modalities, borne out in two case studies

    Recognizing Speech in a Novel Accent: The Motor Theory of Speech Perception Reframed

    Get PDF
    The motor theory of speech perception holds that we perceive the speech of another in terms of a motor representation of that speech. However, when we have learned to recognize a foreign accent, it seems plausible that recognition of a word rarely involves reconstruction of the speech gestures of the speaker rather than the listener. To better assess the motor theory and this observation, we proceed in three stages. Part 1 places the motor theory of speech perception in a larger framework based on our earlier models of the adaptive formation of mirror neurons for grasping, and for viewing extensions of that mirror system as part of a larger system for neuro-linguistic processing, augmented by the present consideration of recognizing speech in a novel accent. Part 2 then offers a novel computational model of how a listener comes to understand the speech of someone speaking the listener's native language with a foreign accent. The core tenet of the model is that the listener uses hypotheses about the word the speaker is currently uttering to update probabilities linking the sound produced by the speaker to phonemes in the native language repertoire of the listener. This, on average, improves the recognition of later words. This model is neutral regarding the nature of the representations it uses (motor vs. auditory). It serve as a reference point for the discussion in Part 3, which proposes a dual-stream neuro-linguistic architecture to revisits claims for and against the motor theory of speech perception and the relevance of mirror neurons, and extracts some implications for the reframing of the motor theory

    Multilingual Articulatory Features

    Get PDF

    Combining articulatory and acoustic information for speech recognition in noisy and reverberant environments

    No full text
    Kirchhoff K. Combining articulatory and acoustic information for speech recognition in noisy and reverberant environments. In: International Conference of Spoken Language Processing. Sydney, Australia; 1998: 891-894.Robust speech recognition under varying acoustic conditions may be achieved by exploiting multiple sources of information in the speech signal. In addition to an acoustic signal representation, we use an articulatory representation consisting of pseudo-articulatory features as an additional information source. Hybrid ANN/HMM recognizers using either of these representations are evaluated on a continuous numbers recognition task (OGI Numbers95) under clean, reverberant and noisy conditions. An error analysis of preliminary recognition results shows that the different representations produce qualitatively different errors, which suggests a combination of both representations. We investigate various combination possibilities at the phoneme estimation level and show that significant improvements can been achieved under all three acoustic conditions

    Combining Articulatory And Acoustic Information For Speech Recognition In Noisy And Reverberant Environments

    No full text
    Robust speech recognition under varying acoustic conditions may be achieved by exploiting multiple sources of information in the speech signal. In addition to an acoustic signal representation, we use an articulatory representation consisting of pseudoarticulatory features as an additional information source. Hybrid ANN/HMM recognizers using either of these representations are evaluated on a continuous numbers recognition task (OGI Numbers95) under clean, reverberant and noisy conditions. An error analysis of preliminary recognition results shows that the different representations produce qualitatively different errors, which suggests a combination of both representations. We investigate various combination possibilities at the phoneme estimation level and show that significant improvements can been achieved under all three acoustic conditions. 1. INTRODUCTION Whereas most speech recognition systems use a cepstral or spectral representation of the speech signal, there have also been ..

    Acoustic Modelling for Under-Resourced Languages

    Get PDF
    Automatic speech recognition systems have so far been developed only for very few languages out of the 4,000-7,000 existing ones. In this thesis we examine methods to rapidly create acoustic models in new, possibly under-resourced languages, in a time and cost effective manner. For this we examine the use of multilingual models, the application of articulatory features across languages, and the automatic discovery of word-like units in unwritten languages
    corecore