14,056 research outputs found

    Speaker Recognition Using Multiple Parametric Self-Organizing Maps

    Get PDF
    Speaker Recognition is the process of automatically recognizing a person who is speaking on the basis of individual parameters included in his/her voice. This technology allows systems to automatically verify identify in applications such as banking by telephone or forensic science. A Speaker Recognition system has the following main modules: Feature Extraction and Classification. For feature extraction the most commonly used techniques are MEL-Frequency Cepstrum Coefficients (MFCC) and Linear Predictive Coding (LPC). For classification and verification, technologies such as Vector Quantization (VQ), Hidden Markov Models (HMM) and Neural Networks have been used. The contribution of this thesis is a new methodology to achieve high accuracy identification and impostor rejection. The new proposed method, Multiple Parametric Self-Organizing Maps (M-PSOM) is a classification and verification technique. The new method was successfully implemented and tested using the CSLU Speaker Recognition Corpora of the Oregon School of Engineering with excellent results

    An Unsupervised Autoregressive Model for Speech Representation Learning

    Full text link
    This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations. In contrast to other speech representation learning methods that aim to remove noise or speaker variabilities, ours is designed to preserve information for a wide range of downstream tasks. In addition, the proposed model does not require any phonetic or word boundary labels, allowing the model to benefit from large quantities of unlabeled data. Speech representations learned by our model significantly improve performance on both phone classification and speaker verification over the surface features and other supervised and unsupervised approaches. Further analysis shows that different levels of speech information are captured by our model at different layers. In particular, the lower layers tend to be more discriminative for speakers, while the upper layers provide more phonetic content.Comment: Accepted to Interspeech 2019. Code available at: https://github.com/iamyuanchung/Autoregressive-Predictive-Codin

    Wavenet based low rate speech coding

    Full text link
    Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty. Our experiments confirm the high performance of the WaveNet based coder and show that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener, even when that speaker has not been used during the training of the generative model.Comment: 5 pages, 2 figure
    • …
    corecore