6 research outputs found

    Speaker Recognition: Advancements and Challenges

    Get PDF

    Use of line spectral frequencies for emotion recognition from speech

    Get PDF
    We propose the use of the line spectral frequency (LSF) features for emotion recognition from speech, which have not been been previously employed for emotion recognition to the best of our knowledge. Spectral features such as mel-scaled cepstral coefficients have already been successfully used for the parameterization of speech signals for emotion recognition. The LSF features also offer a spectral representation for speech, moreover they carry intrinsic information on the formant structure as well, which are related to the emotional state of the speaker. We use the Gaussian mixture model (GMM) classifier architecture, that captures the static color of the spectral features. Experimental studies performed over the Berlin Emotional Speech Database and the FAU Aibo Emotion Corpus demonstrate that decision fusion configurations with LSF features bring a consistent improvement over the MFCC based emotion classification rates.TUBİTAK ; Bahçeşehir University Research Fun

    Rethinking the web structure: focusing on events to create better information and experience management

    Get PDF
    The objective of the following research is to investigate the problem of information management and conveyed experience on the World Wide Web (WWW) when multi-modal sensors and media are available. After studying related areas of work about the web and heterogeneous media, it became apparent that one of the main challenges of the area is the semantic unification of heterogeneous media. This thesis will introduce an event-based model to semantically unify media. An event is defined as something of significance that takes place at a given time and location. Using this definition and the corresponding model, a system will be designed to illustrate practical use cases for events.M.S.Committee Chair: Ramesh Jain; Committee Member: Jim [email protected]; Committee Member: Linda Will

    Parallelization strategy of speaker identification system for Hybrid Modeling

    Get PDF
    Over the last decade technological advances have made speaker recognition brought a significant characteristic in forensics science and biometric identifications. Speaker recognition is a process where a person is recognized on the basis of his/her voice signals (R. C Campbel, 1997). Speaker recognition can be divided into speaker verification and speaker identification. These can furthermore be divided into text dependent and text independent systems. To date, our technology has yet to provide speaker recognition system for many application include access control system, security control for confidential information, transaction authentication and telephone banking. Pattern classification plays as a crucial part in speaker modeling component chain. The result of pattern classification will strongly affect the speaker recognition engine to decide whether to accept or reject a speaker. Many research efforts have been done in speaker recognition pattern classification. There are Dynamic Time Warping (DTW), Vector Quantization (VQ), Hidden Markov Models, Gaussian mixture model (GMM), Support Vector Machine (SVM)(Sadaoki Furui, 1997)and so forth. Building robust speaker recognition systems are often difficult because speech signal is dynamic and influenced by many sources of variation. The past two decades have seen significant progress being made to cope with this problem using different techniques. From among these techniques, hybrid two types of pattern classification have reported promising results in improving the accuracy result. Although producing considerable improvement, these hybrid techniques are still somewhat restricted in terms of recognition accuracy for large data set. Since previous works have reported substantial examples of successful implementation in combining two classification techniques, this research intends to produce a new ways of hybrid techniques in order to solve the accuracy problem for incremental data set condition. We put forward a new VQ-GMM mixture model to improve recognition rate of the speaker identification system in the chapter. VQ and GMM are widely applied to the speaker identification, but both have some disadvantages. To overcome those shortages, we introduce a new hybrid VQ/GMM model to improve recognition rate of the speaker identification system in the chapter. Although in baseline form, the VQ-based solution is less accurate than the GMM, but it offers simplicity in computation. Besides, after some experiments, we found that VQ and GMM techniques are suitable apply for the speaker independent task. Therefore, we hope to make use of their merits via a hybrid VQ/GMM classifier. There are many forms of GMM and other pattern classification techniques adaptation in the past. In hybrid VQ/GMM, most of them use VQ as an optimization function to reduce Expectation Maximization algorithm in order to improve the training speed (Reynolds and Rose, 1995; J. Pelecano, 2000). Besides, some researchers use GMM as a post-processor after VQ cluster the speech signal into regions (Qiguang Lin et al, 1996). In our proposed hybrid modeling, both VQ model and GMM model will run parallel after signal preprocessing process. A comparison performance of hybrid VQ/GMM, DTW, VQ, GMM and SVM techniques for speaker recognition will be done through the experiments and will reported in this chapter. This chapter is organized as follows. In Section 2, we reviews proposed speaker recognition framework. In Section 3, we discuss how we construct the hybrid modeling for pattern classification. Section 4 shows the experimental result for the comparison performance. Finally, section 5 we concludes our work
    corecore