17,116 research outputs found
Comparison GMM and SVM Classifier for Automatic Speaker Verification
The objective of this thesis is to develop automatic text-independent speaker verification systems using unconstrained telephone conversational speech. We began by performing a Gaussian Mixture Model Likelihood ratio verification task in speaker independent system as described by MIT Lincoln Lab. We next introduced a speaker dependent verification system based on speaker dependent thresholds. We then implemented the same system applying Support Vector Machine. In SVM, we used polynomial kernels and radial basis function kernels and compared the performance. For training and testing the system, we used low-level spectral features. Finally, we provided a performance assessment of these systems using the National Institute of Standards and technology (NIST) speaker recognition evaluation 2008 telephone corpora
Analysis of I-Vector framework for Speaker Identification in TV-shows
International audienceInspired from the Joint Factor Analysis, the I-vector-based analysis has become the most popular and state-of-the-art framework for the speaker verification task. Mainly applied within the NIST/SRE evaluation campaigns, many studies have been proposed to improve more and more performance of speaker verification systems. Nevertheless, while the i-vector framework has been used in other speech processing fields like language recognition, a very few studies have been reported for the speaker identification task on TV shows. This work was done in the REPERE challenge context, focused on the people recognition task in multimodal conditions (audio, video, text) from TV show corpora. Moreover, the challenge participants are invited for providing systems for monomodal tasks, like speaker identification. The application of the i-vector framework is investi-gatedthrough different points of views: (1) some of the i-vector based approaches are compared, (2) a specific i-vector extraction protocol is proposed in order to deal with widely varying amounts of training data among speaker population, (3) the joint use of both speaker diarization and identification is finally analyzed. Based on a 533 speaker dictionary, this joint system wins the monomodal speaker identification task of the 2014 REPERE challenge
An Illustrated Methodology for Evaluating ASR Systems
Proceeding of: 9th International Workshop on Adaptive Multimedia Retrieval (AMR 2011) Took place 2011, July, 18-19, in Barcelona, Spain. The event Web site is http://stel.ub.edu/amr2011/Automatic speech recognition technology can be integrated in an information retrieval process to allow searching on multimedia contents. But, in order to assure an adequate retrieval performance is necessary to state the quality of the recognition phase, especially in speaker-independent and domainindependent environments. This paper introduces a methodology to accomplish the evaluation of different speech recognition systems in several scenarios considering also the creation of new corpora of different types (broadcast news, interviews, etc.), especially in other languages apart from English that are not widely addressed in speech community.This work has been partially supported by the Spanish Center for Industry Technological Development (CDTI, Ministry of Industry, Tourism and Trade), through the BUSCAMEDIA Project (CEN-20091026). And also by MA2VICMR: Improving the access, analysis and visibility of the multilingual and multimedia information in web for the Region of Madrid (S2009/TIC-1542).Publicad
Emotional State Categorization from Speech: Machine vs. Human
This paper presents our investigations on emotional state categorization from
speech signals with a psychologically inspired computational model against
human performance under the same experimental setup. Based on psychological
studies, we propose a multistage categorization strategy which allows
establishing an automatic categorization model flexibly for a given emotional
speech categorization task. We apply the strategy to the Serbian Emotional
Speech Corpus (GEES) and the Danish Emotional Speech Corpus (DES), where human
performance was reported in previous psychological studies. Our work is the
first attempt to apply machine learning to the GEES corpus where the human
recognition rates were only available prior to our study. Unlike the previous
work on the DES corpus, our work focuses on a comparison to human performance
under the same experimental settings. Our studies suggest that
psychology-inspired systems yield behaviours that, to a great extent, resemble
what humans perceived and their performance is close to that of humans under
the same experimental setup. Furthermore, our work also uncovers some
differences between machine and humans in terms of emotional state recognition
from speech.Comment: 14 pages, 15 figures, 12 table
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
The I4U consortium was established to facilitate a joint entry to NIST
speaker recognition evaluations (SRE). The latest edition of such joint
submission was in SRE 2018, in which the I4U submission was among the
best-performing systems. SRE'18 also marks the 10-year anniversary of I4U
consortium into NIST SRE series of evaluation. The primary objective of the
current paper is to summarize the results and lessons learned based on the
twelve sub-systems and their fusion submitted to SRE'18. It is also our
intention to present a shared view on the advancements, progresses, and major
paradigm shifts that we have witnessed as an SRE participant in the past decade
from SRE'08 to SRE'18. In this regard, we have seen, among others, a paradigm
shift from supervector representation to deep speaker embedding, and a switch
of research challenge from channel compensation to domain adaptation.Comment: 5 page
- …