22 research outputs found

    Robust speech recognition with spectrogram factorisation

    Get PDF
    Communication by speech is intrinsic for humans. Since the breakthrough of mobile devices and wireless communication, digital transmission of speech has become ubiquitous. Similarly distribution and storage of audio and video data has increased rapidly. However, despite being technically capable to record and process audio signals, only a fraction of digital systems and services are actually able to work with spoken input, that is, to operate on the lexical content of speech. One persistent obstacle for practical deployment of automatic speech recognition systems is inadequate robustness against noise and other interferences, which regularly corrupt signals recorded in real-world environments. Speech and diverse noises are both complex signals, which are not trivially separable. Despite decades of research and a multitude of different approaches, the problem has not been solved to a sufficient extent. Especially the mathematically ill-posed problem of separating multiple sources from a single-channel input requires advanced models and algorithms to be solvable. One promising path is using a composite model of long-context atoms to represent a mixture of non-stationary sources based on their spectro-temporal behaviour. Algorithms derived from the family of non-negative matrix factorisations have been applied to such problems to separate and recognise individual sources like speech. This thesis describes a set of tools developed for non-negative modelling of audio spectrograms, especially involving speech and real-world noise sources. An overview is provided to the complete framework starting from model and feature definitions, advancing to factorisation algorithms, and finally describing different routes for separation, enhancement, and recognition tasks. Current issues and their potential solutions are discussed both theoretically and from a practical point of view. The included publications describe factorisation-based recognition systems, which have been evaluated on publicly available speech corpora in order to determine the efficiency of various separation and recognition algorithms. Several variants and system combinations that have been proposed in literature are also discussed. The work covers a broad span of factorisation-based system components, which together aim at providing a practically viable solution to robust processing and recognition of speech in everyday situations

    Detection, Separation and Recognition of Speech From Continuous Signals Using Spectral Factorisation

    Get PDF
    Publication in the conference proceedings of EUSIPCO, Bucharest, Romania, 201

    Robust speech recognition with spectrogram factorisation

    Get PDF
    Communication by speech is intrinsic for humans. Since the breakthrough of mobile devices and wireless communication, digital transmission of speech has become ubiquitous. Similarly distribution and storage of audio and video data has increased rapidly. However, despite being technically capable to record and process audio signals, only a fraction of digital systems and services are actually able to work with spoken input, that is, to operate on the lexical content of speech. One persistent obstacle for practical deployment of automatic speech recognition systems is inadequate robustness against noise and other interferences, which regularly corrupt signals recorded in real-world environments. Speech and diverse noises are both complex signals, which are not trivially separable. Despite decades of research and a multitude of different approaches, the problem has not been solved to a sufficient extent. Especially the mathematically ill-posed problem of separating multiple sources from a single-channel input requires advanced models and algorithms to be solvable. One promising path is using a composite model of long-context atoms to represent a mixture of non-stationary sources based on their spectro-temporal behaviour. Algorithms derived from the family of non-negative matrix factorisations have been applied to such problems to separate and recognise individual sources like speech. This thesis describes a set of tools developed for non-negative modelling of audio spectrograms, especially involving speech and real-world noise sources. An overview is provided to the complete framework starting from model and feature definitions, advancing to factorisation algorithms, and finally describing different routes for separation, enhancement, and recognition tasks. Current issues and their potential solutions are discussed both theoretically and from a practical point of view. The included publications describe factorisation-based recognition systems, which have been evaluated on publicly available speech corpora in order to determine the efficiency of various separation and recognition algorithms. Several variants and system combinations that have been proposed in literature are also discussed. The work covers a broad span of factorisation-based system components, which together aim at providing a practically viable solution to robust processing and recognition of speech in everyday situations

    State-based labelling for a sparse representation of speech and its application to robust speech recognition

    No full text
    status: publishe

    HMM-regularization for NMF-based noise robust ASR

    No full text
    Gemmeke J.F., Hurmalainen A., Virtanen T., ''HMM-regularization for NMF-based noise robust ASR'', Proceedings 2nd international workshop on machine listening in multisource environments - CHiME 2013 (in conjunction with ICASSP 2013), pp. 47-52, June 1, 2013, Vancouver, Canada.status: publishe

    Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition

    No full text
    status: publishe

    Modelling non-stationary noise with spectral factorisation in automatic speech recognition

    No full text
    Hurmalainen A., Gemmeke J.F., Virtanen T., ''Modelling non-stationary noise with spectral factorisation in automatic speech recognition'', Computer speech and language, vol. 27, no. 3, pp. 763-779, May 2013.status: publishe

    Compact long context spectral factorisation models for noise robust recognition of medium vocabulary speech

    No full text
    Hurmalainen A., Gemmeke J.F., Virtanen T., ''Compact long context spectral factorisation models for noise robust recognition of medium vocabulary speech'', Proceedings 2nd international workshop on machine listening in multisource environments - CHiME 2013 (in conjunction with ICASSP 2013), pp. 13-18, June 1, 2013, Vancouver, Canada.status: publishe

    Non-negative matrix deconvolution in noise robust speech recognition

    No full text
    status: publishe
    corecore