55 research outputs found

    A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings

    Get PDF
    Modern automatic speaker verification relies largely on deep neural networks (DNNs) trained on mel-frequency cepstral coefficient (MFCC) features. While there are alternative feature extraction methods based on phase, prosody and long-term temporal operations, they have not been extensively studied with DNN-based methods. We aim to fill this gap by providing extensive re-assessment of 14 feature extractors on VoxCeleb and SITW datasets. Our findings reveal that features equipped with techniques such as spectral centroids, group delay function, and integrated noise suppression provide promising alternatives to MFCCs for deep speaker embeddings extraction. Experimental results demonstrate up to 16.3\% (VoxCeleb) and 25.1\% (SITW) relative decrease in equal error rate (EER) to the baseline.Comment: Accepted to Interspeech 202

    Speaker Recognition: Advancements and Challenges

    Get PDF

    A Comparative Re-Assessment of Feature Extractors for Deep Speaker Embeddings

    Get PDF
    International audienceModern automatic speaker verification relies largely on deep neural networks (DNNs) trained on mel-frequency cepstral coefficient (MFCC) features. While there are alternative feature extraction methods based on phase, prosody and long-term temporal operations, they have not been extensively studied with DNN-based methods. We aim to fill this gap by providing extensive re-assessment of 14 feature extractors on VoxCeleb and SITW datasets. Our findings reveal that features equipped with techniques such as spectral centroids, group delay function, and integrated noise suppression provide promising alternatives to MFCCs for deep speaker embeddings extraction. Experimental results demonstrate up to 16.3% (VoxCeleb) and 25.1% (SITW) relative decrease in equal error rate (EER) to the baseline

    Voice Based Database Management System Using DSP Processor

    Get PDF
    Security is provided to customers through PIN/ID protection, to secure their data and information using password. This method require user to authenticate them by entering password. There are cases of fraud and theft when people can easily know the password. But there is a existing technology known as Bio-metric Identification System. It uses an individual's bio-metric characteristics, that is unique and therefore can be used to authenticate the user authority access. This invention is an implementation of speaker verification for attendance monitoring system using DSP Processor. First, speech signal will go to pre-processing phase, where it will remove the background noise then, speech signal's features will be extracted using Mel Frequency Cepstral Coefficients (MFCC) method. Then, using Hamming Window, the features will be matched with the reference speech in database. The speaker is identified by comparing the speech signal from the tested speaker. The main focus of this invention is speaker verification, which is compared between speech signal from unknown speaker to a database of known speaker using utterances. The speaker identification is used in this invention for creating database and identifying the students for maintaining attendance record. LCD Display is interfaced with Processor to show the presence of the students for a particular subject. So this can be used for monitoring the attendance of students. Also defaulter student’s list is find out according to the criteria, and it is maintained in MS Excel sheet. Future scope for this can be, informing the monthly attendance to their parents through a text message on their mobile phones using a GSM Module interfaced to processor. DOI: 10.17762/ijritcc2321-8169.150511

    Learnable MFCCs for Speaker Verification

    Get PDF
    We propose a learnable mel-frequency cepstral coefficient (MFCC) frontend architecture for deep neural network (DNN) based automatic speaker verification. Our architecture retains the simplicity and interpretability of MFCC-based features while allowing the model to be adapted to data flexibly. In practice, we formulate data-driven versions of the four linear transforms of a standard MFCC extractor -- windowing, discrete Fourier transform (DFT), mel filterbank and discrete cosine transform (DCT). Results reported reach up to 6.7\% (VoxCeleb1) and 9.7\% (SITW) relative improvement in term of equal error rate (EER) from static MFCCs, without additional tuning effort.Comment: Accepted to ISCAS 202

    Study of Speaker Recognition Systems

    Get PDF
    Speaker Recognition is the computing task of validating a user’s claimed identity using characteristics extracted from their voices. This technique is one of the most useful and popular biometric recognition techniques in the world especially related to areas in which security is a major concern. It can be used for authentication, surveillance, forensic speaker recognition and a number of related activities. Speaker recognition can be classified into identification and verification. Speaker identification is the process of determining which registered speaker provides a given utterance. Speaker verification, on the other hand, is the process of accepting or rejecting the identity claim of a speaker. The process of Speaker recognition consists of 2 modules namely: - feature extraction and feature matching. Feature extraction is the process in which we extract a small amount of data from the voice signal that can later be used to represent each speaker. Feature matching involves identification of the unknown speaker by comparing the extracted features from his/her voice input with the ones from a set of known speakers. Our proposed work consists of truncating a recorded voice signal, framing it, passing it through a window function, calculating the Short Term FFT, extracting its features and matching it with a stored template. Cepstral Coefficient Calculation and Mel frequency Cepstral Coefficients (MFCC) are applied for feature extraction purpose. VQLBG (Vector Quantization via Linde-Buzo-Gray), DTW (Dynamic Time Warping) and GMM (Gaussian Mixture Modelling) algorithms are used for generating template and feature matching purpose
    corecore