1,311 research outputs found
AUTOMATIC IDENTIFICATION OF VIETNAMESE DIALECTS
The dialect identification was studied for many languages over the world nevertheless the research on signal processing for Vietnamese dialects is still limited and there were not many published works. There are many different dialects for Vietnamese. The influence of dialectal features on speech recognition systems is important. If the information about dialects is known during speech recognition process, the performance of recognition systems will be better because the corpus of these systems is normally organized according to different dialects. This paper will present the combination of MFCC coefficients and fundamental frequency features of Vietnamese for dialectal identification based on GMM. The experiment result for the dialect corpus of Vietnamese shows that the performance of dialectal identification is increased from 59% for the case using only MFCC coefficients to 71% for the case using MFCC coefficients and the information of fundamental frequency
MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation
The Multi-target Challenge aims to assess how well current speech technology
is able to determine whether or not a recorded utterance was spoken by one of a
large number of blacklisted speakers. It is a form of multi-target speaker
detection based on real-world telephone conversations. Data recordings are
generated from call center customer-agent conversations. The task is to measure
how accurately one can detect 1) whether a test recording is spoken by a
blacklisted speaker, and 2) which specific blacklisted speaker was talking.
This paper outlines the challenge and provides its baselines, results, and
discussions.Comment: http://mce.csail.mit.edu . arXiv admin note: text overlap with
arXiv:1807.0666
Hierachical methods for large population speaker identification using telephone speech
This study focuses on speaker identificat ion. Several problems such as acoustic noise, channel noise, speaker variability, large population of known group of speakers wi thin the system and many others limit good SiD performance. The SiD system extracts speaker specific features from digitised speech signa] for accurate identification. These feature sets are clustered to form the speaker template known as a speaker model. As the number of speakers enrolling into the system gets larger, more models accumulate and the interspeaker confusion results. This study proposes the hierarchical methods which aim to split the large population of enrolled speakers into smaller groups of model databases for minimising interspeaker confusion
Recommended from our members
Dialect Recognition Using a Phone-GMM-Supervector-Based SVM Kernel
In this paper, we introduce a new approach to dialect recognition which relies on the hypothesis that certain phones are realized differently across dialects. Given a speaker’s utterance, we first obtain the most likely phone sequence using a phone recognizer. We then extract GMM Supervectors for each phone instance. Using these vectors, we design a kernel function that computes the similarities of phones between pairs of utterances. We employ this kernel to train SVM classifiers that estimate posterior probabilities, used during recognition. Testing our approach on four Arabic dialects from 30s cuts, we compare our performance to five approaches: PRLM; GMM-UBM; our own improved version of GMM-UBM which employs fMLLR adaptation; our recent discriminative phonotactic approach; and a state-of-the-art system: SDC-based GMM-UBM discriminatively trained. Our kernel-based technique outperforms all these previous approaches; the overall EER of our system is 4.9%
- …