12,016 research outputs found

    Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments

    Get PDF
    Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge. Data-driven supervised approaches, including ones based on deep neural networks, have recently emerged as potential alternatives to traditional unsupervised approaches and with sufficient training, can alleviate the shortcomings of the unsupervised methods in various real-life acoustic environments. In this light, we review recently developed, representative deep learning approaches for tackling non-stationary additive and convolutional degradation of speech with the aim of providing guidelines for those involved in the development of environmentally robust speech recognition systems. We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks

    A frequency-selective feedback model of auditory efferent suppression and its implications for the recognition of speech in noise

    Get PDF
    The potential contribution of the peripheral auditory efferent system to our understanding of speech in a background of competing noise was studied using a computer model of the auditory periphery and assessed using an automatic speech recognition system. A previous study had shown that a fixed efferent attenuation applied to all channels of a multi-channel model could improve the recognition of connected digit triplets in noise [G. J. Brown, R. T. Ferry, and R. Meddis, J. Acoust. Soc. Am. 127, 943?954 (2010)]. In the current study an anatomically justified feedback loop was used to automatically regulate separate attenuation values for each auditory channel. This arrangement resulted in a further enhancement of speech recognition over fixed-attenuation conditions. Comparisons between multi-talker babble and pink noise interference conditions suggest that the benefit originates from the model?s ability to modify the amount of suppression in each channel separately according to the spectral shape of the interfering sounds
    • …
    corecore