829 research outputs found

    A frequency-selective feedback model of auditory efferent suppression and its implications for the recognition of speech in noise

    Get PDF
    The potential contribution of the peripheral auditory efferent system to our understanding of speech in a background of competing noise was studied using a computer model of the auditory periphery and assessed using an automatic speech recognition system. A previous study had shown that a fixed efferent attenuation applied to all channels of a multi-channel model could improve the recognition of connected digit triplets in noise [G. J. Brown, R. T. Ferry, and R. Meddis, J. Acoust. Soc. Am. 127, 943?954 (2010)]. In the current study an anatomically justified feedback loop was used to automatically regulate separate attenuation values for each auditory channel. This arrangement resulted in a further enhancement of speech recognition over fixed-attenuation conditions. Comparisons between multi-talker babble and pink noise interference conditions suggest that the benefit originates from the model?s ability to modify the amount of suppression in each channel separately according to the spectral shape of the interfering sounds

    A computer model of auditory efferent suppression: Implications for the recognition of speech in noise

    Get PDF
    The neural mechanisms underlying the ability of human listeners to recognize speech in the presence of background noise are still imperfectly understood. However, there is mounting evidence that the medial olivocochlear system plays an important role, via efferents that exert a suppressive effect on the response of the basilar membrane. The current paper presents a computer modeling study that investigates the possible role of this activity on speech intelligibility in noise. A model of auditory efferent processing [ Ferry, R. T., and Meddis, R. (2007). J. Acoust. Soc. Am. 122, 3519?3526 ] is used to provide acoustic features for a statistical automatic speech recognition system, thus allowing the effects of efferent activity on speech intelligibility to be quantified. Performance of the ?basic? model (without efferent activity) on a connected digit recognition task is good when the speech is uncorrupted by noise but falls when noise is present. However, recognition performance is much improved when efferent activity is applied. Furthermore, optimal performance is obtained when the amount of efferent activity is proportional to the noise level. The results obtained are consistent with the suggestion that efferent suppression causes a ?release from adaptation? in the auditory-nerve response to noisy speech, which enhances its intelligibility

    Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation

    Full text link
    Conventional automatic speech recognition (ASR) systems trained from frame-level alignments can easily leverage posterior fusion to improve ASR accuracy and build a better single model with knowledge distillation. End-to-end ASR systems trained using the Connectionist Temporal Classification (CTC) loss do not require frame-level alignment and hence simplify model training. However, sparse and arbitrary posterior spike timings from CTC models pose a new set of challenges in posterior fusion from multiple models and knowledge distillation between CTC models. We propose a method to train a CTC model so that its spike timings are guided to align with those of a pre-trained guiding CTC model. As a result, all models that share the same guiding model have aligned spike timings. We show the advantage of our method in various scenarios including posterior fusion of CTC models and knowledge distillation between CTC models with different architectures. With the 300-hour Switchboard training data, the single word CTC model distilled from multiple models improved the word error rates to 13.7%/23.1% from 14.9%/24.1% on the Hub5 2000 Switchboard/CallHome test sets without using any data augmentation, language model, or complex decoder.Comment: Accepted to Interspeech 201

    An Intelligent Text Extraction and Navigation System

    Get PDF
    We present sppc, a high-performance system for intelligent text extraction and navigation from German free text documents. The main purpose of sppc is to extract as much linguistic structure as possible for performing domain-specific processing. sppc consists of a set of domain-independent shallow core components which are realized by means of cascaded weighted finite state machines and generic dynamic tries. All extracted information is represented uniformly in one data structure (called the text chart) in a highly compact and linked form in order to support indexing and navigation through the set of solutions. Germa

    Open Microphone Speech Understanding: Correct Discrimination Of In Domain Speech

    Get PDF
    An ideal spoken dialogue system listens continually and determines which utterances were spoken to it, understands them and responds appropriately while ignoring the rest This paper outlines a simple method for achieving this goal which involves trading a slightly higher false rejection rate of in domain utterances for a higher correct rejection rate of Out of Domain (OOD) utterances. The system recognizes semantic entities specified by a unification grammar which is specialized by Explanation Based Learning (EBL). so that it only uses rules which are seen in the training data. The resulting grammar has probabilities assigned to each construct so that overgeneralizations are not a problem. The resulting system only recognizes utterances which reduce to a valid logical form which has meaning for the system and rejects the rest. A class N-gram grammar has been trained on the same training data. This system gives good recognition performance and offers good Out of Domain discrimination when combined with the semantic analysis. The resulting systems were tested on a Space Station Robot Dialogue Speech Database and a subset of the OGI conversational speech database. Both systems run in real time on a PC laptop and the present performance allows continuous listening with an acceptably low false acceptance rate. This type of open microphone system has been used in the Clarissa procedure reading and navigation spoken dialogue system which is being tested on the International Space Station

    Coupling an automatic dictation system with a grammar checker

    Get PDF
    corecore