3,636 research outputs found

    Word And Speaker Recognition System

    Get PDF
    In this report, a system which combines user dependent Word Recognition and text dependent speaker recognition is described. Word recognition is the process of converting an audio signal, captured by a microphone, to a word. Speaker Identification is the ability to recognize a person identity base on the specific word he/she uttered. A person's voice contains various parameters that convey information such as gender, emotion, health, attitude and identity. Speaker recognition identifies who is the speaker based on the unique voiceprint from the speech data. Voice Activity Detection (VAD), Spectral Subtraction (SS), Mel-Frequency Cepstrum Coefficient (MFCC), Vector Quantization (VQ), Dynamic Time Warping (DTW) and k-Nearest Neighbour (k-NN) are methods used in word recognition part of the project to implement using MATLAB software. For Speaker Recognition part, Vector Quantization (VQ) is used. The recognition rate for word and speaker recognition system that was successfully implemented is 84.44% for word recognition while for speaker recognition is 54.44%

    Speaker Identification Based On Discriminative Vector Quantization And Data Fusion

    Get PDF
    Speaker Identification (SI) approaches based on discriminative Vector Quantization (VQ) and data fusion techniques are presented in this dissertation. The SI approaches based on Discriminative VQ (DVQ) proposed in this dissertation are the DVQ for SI (DVQSI), the DVQSI with Unique speech feature vector space segmentation for each speaker pair (DVQSI-U), and the Adaptive DVQSI (ADVQSI) methods. The difference of the probability distributions of the speech feature vector sets from various speakers (or speaker groups) is called the interspeaker variation between speakers (or speaker groups). The interspeaker variation is the measure of template differences between speakers (or speaker groups). All DVQ based techniques presented in this contribution take advantage of the interspeaker variation, which are not exploited in the previous proposed techniques by others that employ traditional VQ for SI (VQSI). All DVQ based techniques have two modes, the training mode and the testing mode. In the training mode, the speech feature vector space is first divided into a number of subspaces based on the interspeaker variations. Then, a discriminative weight is calculated for each subspace of each speaker or speaker pair in the SI group based on the interspeaker variation. The subspaces with higher interspeaker variations play more important roles in SI than the ones with lower interspeaker variations by assigning larger discriminative weights. In the testing mode, discriminative weighted average VQ distortions instead of equally weighted average VQ distortions are used to make the SI decision. The DVQ based techniques lead to higher SI accuracies than VQSI. DVQSI and DVQSI-U techniques consider the interspeaker variation for each speaker pair in the SI group. In DVQSI, speech feature vector space segmentations for all the speaker pairs are exactly the same. However, each speaker pair of DVQSI-U is treated individually in the speech feature vector space segmentation. In both DVQSI and DVQSI-U, the discriminative weights for each speaker pair are calculated by trial and error. The SI accuracies of DVQSI-U are higher than those of DVQSI at the price of much higher computational burden. ADVQSI explores the interspeaker variation between each speaker and all speakers in the SI group. In contrast with DVQSI and DVQSI-U, in ADVQSI, the feature vector space segmentation is for each speaker instead of each speaker pair based on the interspeaker variation between each speaker and all the speakers in the SI group. Also, adaptive techniques are used in the discriminative weights computation for each speaker in ADVQSI. The SI accuracies employing ADVQSI and DVQSI-U are comparable. However, the computational complexity of ADVQSI is much less than that of DVQSI-U. Also, a novel algorithm to convert the raw distortion outputs of template-based SI classifiers into compatible probability measures is proposed in this dissertation. After this conversion, data fusion techniques at the measurement level can be applied to SI. In the proposed technique, stochastic models of the distortion outputs are estimated. Then, the posteriori probabilities of the unknown utterance belonging to each speaker are calculated. Compatible probability measures are assigned based on the posteriori probabilities. The proposed technique leads to better SI performance at the measurement level than existing approaches

    Modeling Text Independent Speaker Identification with Vector Quantization

    Get PDF
    Speaker identification is one of the most important technology nowadays. Many fields such as bioinformatics and security are using speaker identification. Also, almost all electronic devices are using this technology too. Based on number of text, speaker identification divided into text dependent and text independent. On many fields, text independent is mostly used because number of text is unlimited. So, text independent is generally more challenging than text dependent. In this research, speaker identification text independent with Indonesian speaker data was modelled with Vector Quantization (VQ). In this research VQ with K-Means initialization was used. K-Means clustering also was used to initialize mean and Hierarchical Agglomerative Clustering was used to identify K value for VQ. The best VQ accuracy was 59.67% when k was 5. According to the result, Indonesian language could be modelled by VQ. This research can be developed using optimization method for VQ parameters such as Genetic Algorithm or Particle Swarm Optimization

    Word And Speaker Recognition System

    Get PDF
    In this report, a system which combines user dependent Word Recognition and text dependent speaker recognition is described. Word recognition is the process of converting an audio signal, captured by a microphone, to a word. Speaker Identification is the ability to recognize a person identity base on the specific word he/she uttered. A person's voice contains various parameters that convey information such as gender, emotion, health, attitude and identity. Speaker recognition identifies who is the speaker based on the unique voiceprint from the speech data. Voice Activity Detection (VAD), Spectral Subtraction (SS), Mel-Frequency Cepstrum Coefficient (MFCC), Vector Quantization (VQ), Dynamic Time Warping (DTW) and k-Nearest Neighbour (k-NN) are methods used in word recognition part of the project to implement using MATLAB software. For Speaker Recognition part, Vector Quantization (VQ) is used. The recognition rate for word and speaker recognition system that was successfully implemented is 84.44% for word recognition while for speaker recognition is 54.44%

    Phoneme Weighting and Energy-Based Weighting for Speaker Recognition

    Get PDF
    This dissertation focuses on determining specific vowel phonemes which work best for speaker identification and speaker verification, and also developing new algorithms to improve speaker identification accuracy. Results from the first part of our research indicate that the vowels /i/, /E/ and /u/ were the ones having the highest recognition scores for both the Gaussian mixture model (GMM) and vector quantization (VQ) methods (at most one classification error). For VQ, /i/, /I/, /e/, /E/ and /@/ had no classification errors. Persons speaking /E/, /o/ and /u/ have been verified well by both GMM and VQ methods in our experiments. For VQ, the verification results are consistent with the identification results since the same five phonemes performed the best and had less than one verification error. After determining several ideal vowel phonemes, we developed new algorithms for improved speaker identification accuracy. Phoneme weighting methods (which performed classification based on the ideal phonemes we found from the previous experiments) and other weighting methods based on energy were used. The energy weighting methods performed better than the phoneme weighting methods in our experiments. The first energy weighting method ignores the speech frames which have relatively small magnitude. Instead of ignoring the frames which have relatively small magnitude, the second method emphasizes speech frames which have relatively large magnitude. The third method and the adjusted third method are a combination of the previous two methods. The error reduction rate was 7.9% after applying the first method relative to a baseline system (which used Mel frequency cepstral coefficients (MFCCs) as feature and VQ as classifier). After applying the second method and the adjusted third method, the error reduction rate was 28.9% relative to a baseline system
    • …
    corecore