22 research outputs found
Phonetic Classification Using Hierarchical, Feed-forward, Spectro-temporal Patch-based Architectures
A preliminary set of experiments are described in which a biologically-inspired computer vision system (Serre, Wolf et al. 2005; Serre 2006; Serre, Oliva et al. 2006; Serre, Wolf et al. 2006) designed for visual object recognition was applied to the task of phonetic classification. During learning, the systemprocessed 2-D wideband magnitude spectrograms directly as images, producing a set of 2-D spectrotemporal patch dictionaries at different spectro-temporal positions, orientations, scales, and of varying complexity. During testing, features were computed by comparing the stored patches with patches fromnovel spectrograms. Classification was performed using a regularized least squares classifier (Rifkin, Yeo et al. 2003; Rifkin, Schutte et al. 2007) trained on the features computed by the system. On a 20-class TIMIT vowel classification task, the model features achieved a best result of 58.74% error, compared to 48.57% error using state-of-the-art MFCC-based features trained using the same classifier. This suggests that hierarchical, feed-forward, spectro-temporal patch-based architectures may be useful for phoneticanalysis
Autoregressive Modelling of Hilbert Envelopes for Wide-band Audio Coding
Frequency Domain Linear Prediction (FDLP) represents the technique for approximating temporal envelopes of a signal using autoregressive models. In this paper, we propose a wide-band audio coding system exploiting FDLP. Specifically, FDLP is applied on critically sampled sub-bands to model the Hilbert envelopes. The residual of the linear prediction forms the Hilbert carrier, which is transmitted along with the envelope parameters. This process is reversed at the decoder to reconstruct the signal. In the objective and subjective quality evaluations, the FDLP based audio codec at kbps provides competitive results compared to the state-of-art codecs at similar bit-rates
Multi-stream adaptive evidence combination for noise robust ASR
In this paper we develop different mathematical models in the framework of the multi-stream paradigm for noise robust ASR, and discuss their close relationship with human speech perception. Largely inspired by Fletcher's "product-of-errors" rule in psychoacoustics, multi-band ASR aims for robustness to data mismatch through the exploitation of spectral redundancy, while making minimum assumptions about noise type. Previous ASR tests have shown that independent sub-band processing can lead to decreased recognition performance with clean speech. We have overcome this problem by considering every combination of data sub-bands as an independent data stream. After introducing the background to multi-band ASR, we show how this "full combination" approach can be formalised, in the context of HMM/ANN based ASR, by introducing a latent variable to specify which data sub-bands in each data frame are free from data mismatch. This enables us to decompose the posterior probability for each phoneme into a reliability weighted integral over all possible positions of clean data. This approach offers great potential for adaptation to rapidly changing and unpredictable noise
A combined evaluation of established and new approaches for speech recognition in varied reverberation conditions
International audienceRobustness to reverberation is a key concern for distant-microphone ASR. Various approaches have been proposed, including single-channel or multichannel dereverberation, robust feature extraction, alternative acoustic models, and acoustic model adaptation. However, to the best of our knowledge, a detailed study of these techniques in varied reverberation conditions is still missing in the literature. In this paper, we conduct a series of experiments to assess the impact of various dereverberation and acoustic model adaptation approaches on the ASR performance in the range of reverberation conditions found in real domestic environments. We consider both established approaches such as WPE and newer approaches such as learning hidden unit contribution (LHUC) adaptations, whose performance has not been reported before in this context, and we employ them in combination. Our results indicate that performing weighted prediction error (WPE) dereverberation on a reverberated test speech utterance and decoding using an deep neural network (DNN) acoustic model trained with multi-condition reverberated speech with feature-space maximum likelihood linear regression (fMLLR) transformed features, outperforms more recent approaches and helps significantly reduce the word error rate (WER)
Recommended from our members
New techniques for vibration condition monitoring: Volterra kernel and Kolmogorov-Smirnov
This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel University.This research presents a complete review of signal processing techniques used, today,
in vibration based industrial condition monitoring and diagnostics. It also introduces
two novel techniques to this field, namely: the Kolmogorov-Smirnov test and Volterra
series, which have not yet been applied to vibration based condition monitoring.
The first technique, the Kolmogorov-Smirnov test, relies on a statistical comparison
of the cumulative probability distribution functions (CDF) from two time series. It
must be emphasised that this is not a moment technique, and it uses the whole CDF,
in the comparison process.
The second tool suggested in this research is the Volterra series. This is a non-linear
signal processing technique, which can be used to model a time series. The
parameters of this model are used for condition monitoring applications.
Finally, this work also presents a comprehensive comparative study between these
new methods and the existing techniques. This study is based on results from
numerical and experimental applications of each technique here discussed.
The concluding remarks include suggestions on how the novel techniques proposed here can be improved.Brunel University Department of Mechanical Engineering and CAPES, Fundacao
Coordenacao de Aperfeicoamento de Pessoal de Nivel Superior