479 research outputs found

    Phase AutoCorrelation (PAC) derived Robust Speech Features

    Get PDF
    In this paper, we introduce a new class of noise robust acoustic features derived from a new measure of autocorrelation, and explicitly exploiting the phase variation of the speech signal frame over time. This family of features, referred to as ``Phase AutoCorrelation'' (PAC) features, include PAC spectrum and PAC MFCC, among others. In regular autocorrelation based features, the correlation between two signal segments (signal vectors), separated by a particular time interval kk, is calculated as a dot product of these two vectors. In our proposed PAC approach, the angle between the two vectors is used as a measure of correlation. Since dot product is usually more affected by noise than the angle, it is expected that PAC-features will be more robust to noise. This is indeed significantly confirmed by the experimental results presented in this paper. The experiments were conducted on the Numbers 95 database, on which ``stationary'' (car) and ``non-stationary'' (factory) Noisex 92 noises were added with varying SNR. In most of the cases, without any specific tuning, PAC-MFCC features perform better

    Entropy Based Combination of Tandem Representations for Noise Robust ASR

    Get PDF
    In this paper, we present an entropy based method to combine tandem representations of the recently proposed Phase AutoCorrelation (PAC) based features and Mel-Frequency Cepstral Coefficients (MFCC) features. PAC based features, derived from a nonlinear transformation of autocorrelation coefficients and shown to be noise robust, improve their robustness to additive noise in their tandem representation. On the other hand, MFCC features in their tandem representation show a significant improvement in recognition performance on clean speech. An entropy based combination method investigated in this paper adaptively gives a higher weighting to the representation of MFCC features in clean speech and to the representation of PAC based features in noisy speech, thus yielding a robust recognition performance in all conditions

    A Computational Model of Auditory Feature Extraction and Sound Classification

    Get PDF
    This thesis introduces a computer model that incorporates responses similar to those found in the cochlea, in sub-corticai auditory processing, and in auditory cortex. The principle aim of this work is to show that this can form the basis for a biologically plausible mechanism of auditory stimulus classification. We will show that this classification is robust to stimulus variation and time compression. In addition, the response of the system is shown to support multiple, concurrent, behaviourally relevant classifications of natural stimuli (speech). The model incorporates transient enhancement, an ensemble of spectro - temporal filters, and a simple measure analogous to the idea of visual salience to produce a quasi-static description of the stimulus suitable either for classification with an analogue artificial neural network or, using appropriate rate coding, a classifier based on artificial spiking neurons. We also show that the spectotemporal ensemble can be derived from a limited class of 'formative' stimuli, consistent with a developmental interpretation of ensemble formation. In addition, ensembles chosen on information theoretic grounds consist of filters with relatively simple geometries, which is consistent with reports of responses in mammalian thalamus and auditory cortex. A powerful feature of this approach is that the ensemble response, from which salient auditory events are identified, amounts to stimulus-ensemble driven method of segmentation which respects the envelope of the stimulus, and leads to a quasi-static representation of auditory events which is suitable for spike rate coding. We also present evidence that the encoded auditory events may form the basis of a representation-of-similarity, or second order isomorphism, which implies a representational space that respects similarity relationships between stimuli including novel stimuli

    Infraslow fluctuations and respiration are driving cortical neurophysiological oscillations during sleep

    Get PDF
    Abstract. Recently sleep has been linked to increased brain clearance through perivascular spaces from blood-brain barrier (BBB) externa limitans, facilitated by physiological pulsations such as cardiovascular and respiratory pulsations. Infraslow fluctuations (ISFs) characterize both fMRI BOLD signals and scalp EEG potentials. They are associated with both permeability fluctuations of BBB and the amplitude dynamics of faster (> 1Hz) neuronal oscillations. ISF together with respiration are though to synchronize with neural rhythms, however the directionality of these interactions has not been studied before. I used non-invasive measures which are necessary not to interfere the pressure sensitive CSF convection and BBB permeability combined with directional metrics to fully evaluate these relationships. I recorded full-band resting state EEG (fbEEG) during wakefulness and sleep and investigated whether recently shown increased brain clearance during sleep is followed by increased drive of neural amplitudes by the ISF and respiration phases. I show that ISF power increases during non-REM sleep, possibly reflecting altered BBB status. Furthermore, I show that ISF and respiration phase-amplitude couple and predict neuronal brain rhythms seen especially during sleep. These results pave the way for understanding the mechanisms how neuronal activity is modulated by the slow oscillations in human brain during wakefulness and sleep

    Synergy of Acoustic-Phonetics and Auditory Modeling Towards Robust Speech Recognition

    Get PDF
    The problem addressed in this work is that of enhancing speech signals corrupted by additive noise and improving the performance of automatic speech recognizers in noisy conditions. The enhanced speech signals can also improve the intelligibility of speech in noisy conditions for human listeners with hearing impairment as well as for normal listeners. The original Phase Opponency (PO) model, proposed to detect tones in noise, simulates the processing of the information in neural discharge times and exploits the frequency-dependent phase properties of the tuned filters in the auditory periphery along with the cross-auditory-nerve-fiber coincidence detection to extract temporal cues. The Modified Phase Opponency (MPO) proposed here alters the components of the PO model in such a way that the basic functionality of the PO model is maintained but the various properties of the model can be analyzed and modified independently of each other. This work presents a detailed mathematical formulation of the MPO model and the relation between the properties of the narrowband signal that needs to be detected and the properties of the MPO model. The MPO speech enhancement scheme is based on the premise that speech signals are composed of a combination of narrow band signals (i.e. harmonics) with varying amplitudes. The MPO enhancement scheme outperforms many of the other speech enhancement techniques when evaluated using different objective quality measures. Automatic speech recognition experiments show that replacing noisy speech signals by the corresponding MPO-enhanced speech signals leads to an improvement in the recognition accuracies at low SNRs. The amount of improvement varies with the type of the corrupting noise. Perceptual experiments indicate that: (a) there is little perceptual difference in the MPO-processed clean speech signals and the corresponding original clean signals and (b) the MPO-enhanced speech signals are preferred over the output of the other enhancement methods when the speech signals are corrupted by subway noise but the outputs of the other enhancement schemes are preferred when the speech signals are corrupted by car noise

    Activity Report 2004

    Get PDF

    A Spectrogram Model for Enhanced Source Localization and Noise-Robust ASR

    Get PDF
    This paper proposes a simple, computationally efficient 2-mixture model approach to discrimination between speech and background noise. It is directly derived from observations on real data, and can be used in a fully unsupervised manner, with the EM algorithm. A first application to sector-based, joint audio source localization and detection, using multiple microphones, confirms that the model can provide major enhancement. A second application to the single channel speech recognition task in a noisy environment yields major improvement on stationary noise and promising results on non-stationary noise
    • …
    corecore