3 research outputs found

    Some applications of a priori knowledge in multi-stream HMM and HMM/ANN based ASR

    Get PDF
    Multi-band ASR was largely inspired by the extremely high level of redundancy in the spectral signal representation which can be inferred from Fletcher's product-of-errors rule for human speech perception. Indeed, the main aim of the multi-band approach is to exploit this redundancy in order to overcome the problem of data mismatch (while making no assumptions about noise type) by focusing recognition on sub-bands estimated to contain reliable, or "clean speech like", data. However, multi-band processing also presents the opportunity to introduce a number of other ideas from phonetics, non-linear phonology and auditory processing into the recognition process. In particular: we can weight sub-bands, or sub-band combinations, according to the most likely frequency range of characteristic features for the phoneme whose presence we are testing for; we can allow some degree of asynchrony between sub-bands, and we can preprocess each sub-band according the kind of acoustic features which we expect to find there. Besides combining sub-band experts, we can also combine multiple full-band experts, where each expert is perhaps suited to extracting complementary sources of speech information, or is robust to different kinds of noise. In this article we present an outline of some of the recent work at IDIAP, and cooperating institutions, in bringing together ideas from different areas of speech science within the framework of multi-stream HMM and HMM/ANN based ASR

    Multi-stream Processing for Noise Robust Speech Recognition

    Get PDF
    In this thesis, the framework of multi-stream combination has been explored to improve the noise robustness of automatic speech recognition (ASR) systems. The central idea of multi-stream ASR is to combine information from several sources to improve the performance of a system. The two important issues of multi-stream systems are which information sources (feature representations) to combine and what importance (weights) be given to each information source. In the framework of hybrid hidden Markov model/artificial neural network (HMM/ANN) and Tandem systems, several weighting strategies are investigated in this thesis to merge the posterior outputs of multi-layered perceptrons (MLPs) trained on different feature representations. The best results were obtained by inverse entropy weighting in which the posterior estimates at the output of the MLPs were weighted by their respective inverse output entropies. In the second part of this thesis, two feature representations have been investigated, namely pitch frequency and spectral entropy features. The pitch frequency feature is used along with perceptual linear prediction (PLP) features in a multi-stream framework. The second feature proposed in this thesis is estimated by applying an entropy function to the normalized spectrum to produce a measure which has been termed spectral entropy. The idea of the spectral entropy feature is extended to multi-band spectral entropy features by dividing the normalized full-band spectrum into sub-bands and estimating the spectral entropy of each sub-band. The proposed multi-band spectral entropy features were observed to be robust in high noise conditions. Subsequently, the idea of embedded training is extended to multi-stream HMM/ANN systems. To evaluate the maximum performance that can be achieved by frame-level weighting, we investigated an ``oracle test''. We also studied the relationship of oracle selection to inverse entropy weighting and proposed an alternative interpretation of the oracle test to analyze the complementarity of streams in multi-stream systems. The techniques investigated in this work gave a significant improvement in performance for clean as well as noisy test conditions
    corecore