Search CORE

6 research outputs found

Speaker Recognition: Advancements and Challenges

Author: Homayoon Beigi
Publication venue: 'IntechOpen'
Publication date: 28/11/2012
Field of study

Use of line spectral frequencies for emotion recognition from speech

Author: Hollert Henner
Kuckelkorn Jochen
Maletz Sibylle
Redelstein Regine
Seiler Thomas-Benjamin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

We propose the use of the line spectral frequency (LSF) features for emotion recognition from speech, which have not been been previously employed for emotion recognition to the best of our knowledge. Spectral features such as mel-scaled cepstral coefficients have already been successfully used for the parameterization of speech signals for emotion recognition. The LSF features also offer a spectral representation for speech, moreover they carry intrinsic information on the formant structure as well, which are related to the emotional state of the speaker. We use the Gaussian mixture model (GMM) classifier architecture, that captures the static color of the spectral features. Experimental studies performed over the Berlin Emotional Speech Database and the FAU Aibo Emotion Corpus demonstrate that decision fusion configurations with LSF features bring a consistent improvement over the MFCC based emotion classification rates.TUBİTAK ; Bahçeşehir University Research Fun

Crossref

eResearch@Ozyegin

Rethinking the web structure: focusing on events to create better information and experience management

Author: Pack Derik Leroi
Publication venue: Georgia Institute of Technology
Publication date: 12/07/2004
Field of study

The objective of the following research is to investigate the problem of information management and conveyed experience on the World Wide Web (WWW) when multi-modal sensors and media are available. After studying related areas of work about the web and heterogeneous media, it became apparent that one of the main challenges of the area is the semantic unification of heterogeneous media. This thesis will introduce an event-based model to semantically unify media. An event is defined as something of significance that takes place at a given time and location. Using this definition and the corresponding model, a system will be designed to illustrate practical use cases for events.M.S.Committee Chair: Ramesh Jain; Committee Member: Jim [email protected]; Committee Member: Linda Will

Scholarly Materials And Research @ Georgia Tech

Parallelization strategy of speaker identification system for Hybrid Modeling

Author: Ahmad Abdul Manan
Loh Mun Yee
Publication venue: 'Penerbit UTM Press'
Publication date: 01/01/2008
Field of study

Over the last decade technological advances have made speaker recognition brought a significant characteristic in forensics science and biometric identifications. Speaker recognition is a process where a person is recognized on the basis of his/her voice signals (R. C Campbel, 1997). Speaker recognition can be divided into speaker verification and speaker identification. These can furthermore be divided into text dependent and text independent systems. To date, our technology has yet to provide speaker recognition system for many application include access control system, security control for confidential information, transaction authentication and telephone banking. Pattern classification plays as a crucial part in speaker modeling component chain. The result of pattern classification will strongly affect the speaker recognition engine to decide whether to accept or reject a speaker. Many research efforts have been done in speaker recognition pattern classification. There are Dynamic Time Warping (DTW), Vector Quantization (VQ), Hidden Markov Models, Gaussian mixture model (GMM), Support Vector Machine (SVM)(Sadaoki Furui, 1997)and so forth. Building robust speaker recognition systems are often difficult because speech signal is dynamic and influenced by many sources of variation. The past two decades have seen significant progress being made to cope with this problem using different techniques. From among these techniques, hybrid two types of pattern classification have reported promising results in improving the accuracy result. Although producing considerable improvement, these hybrid techniques are still somewhat restricted in terms of recognition accuracy for large data set. Since previous works have reported substantial examples of successful implementation in combining two classification techniques, this research intends to produce a new ways of hybrid techniques in order to solve the accuracy problem for incremental data set condition. We put forward a new VQ-GMM mixture model to improve recognition rate of the speaker identification system in the chapter. VQ and GMM are widely applied to the speaker identification, but both have some disadvantages. To overcome those shortages, we introduce a new hybrid VQ/GMM model to improve recognition rate of the speaker identification system in the chapter. Although in baseline form, the VQ-based solution is less accurate than the GMM, but it offers simplicity in computation. Besides, after some experiments, we found that VQ and GMM techniques are suitable apply for the speaker independent task. Therefore, we hope to make use of their merits via a hybrid VQ/GMM classifier. There are many forms of GMM and other pattern classification techniques adaptation in the past. In hybrid VQ/GMM, most of them use VQ as an optimization function to reduce Expectation Maximization algorithm in order to improve the training speed (Reynolds and Rose, 1995; J. Pelecano, 2000). Besides, some researchers use GMM as a post-processor after VQ cluster the speech signal into regions (Qiguang Lin et al, 1996). In our proposed hybrid modeling, both VQ model and GMM model will run parallel after signal preprocessing process. A comparison performance of hybrid VQ/GMM, DTW, VQ, GMM and SVM techniques for speaker recognition will be done through the experiments and will reported in this chapter. This chapter is organized as follows. In Section 2, we reviews proposed speaker recognition framework. In Section 3, we discuss how we construct the hybrid modeling for pattern classification. Section 4 shows the experimental result for the comparison performance. Finally, section 5 we concludes our work

Universiti Teknologi Malaysia Institutional Repository