3 research outputs found

    Some Emerging Concepts in Speech Recognition.

    Get PDF
    The paper presents a work-in-progress on several emerging concepts in Automatic Speech Recognition (ASR), that are being currently studied at IDIAP. This work can be roughly categorized into three categories: 1) data-guided features, 2) features based on modulation spectrum of speech, 3) minimum entropy based multi-stream information fusion

    Stochastic techniques in deriving perceptual knowledge

    Get PDF
    The paper argues on examples of selected past works that stochastic and knowledge-based approaches do not contradict each other. Frequency resolution of human hearing is decreasing with increasing frequency. Spectral basis designed for optimal discrimination among different phonemes of speech have similar property. Further, human hearing is most sensitive to modulations with frequency around 4 Hz. Filters on feature trajectories, designed for optimal discrimination among phonemes of speech are bandpass with central frequency around 4 Hz

    Optimization of data-driven filterbank for automatic speaker verification

    Get PDF
    Most of the speech processing applications use triangular filters spaced in mel-scale for feature extraction. In this paper, we propose a new data-driven filter design method which optimizes filter parameters from a given speech data. First, we introduce a frame-selection based approach for developing speech-signal-based frequency warping scale. Then, we propose a new method for computing the filter frequency responses by using principal component analysis (PCA). The main advantage of the proposed method over the recently introduced deep learning based methods is that it requires very limited amount of unlabeled speech-data. We demonstrate that the proposed filterbank has more speaker discriminative power than commonly used mel filterbank as well as existing data-driven filterbank. We conduct automatic speaker verification (ASV) experiments with different corpora using various classifier back-ends. We show that the acoustic features created with proposed filterbank are better than existing mel-frequency cepstral coefficients (MFCCs) and speech-signal-based frequency cepstral coefficients (SFCCs) in most cases. In the experiments with VoxCeleb1 and popular i-vector back-end, we observe 9.75% relative improvement in equal error rate (EER) over MFCCs. Similarly, the relative improvement is 4.43% with recently introduced x-vector system. We obtain further improvement using fusion of the proposed method with standard MFCC-based approach.Comment: Published in Digital Signal Processing journal (Elsevier
    corecore