2,984 research outputs found

    Emotion Recognition from Acted and Spontaneous Speech

    Get PDF
    Dizertační práce se zabývá rozpoznáním emočního stavu mluvčích z řečového signálu. Práce je rozdělena do dvou hlavních častí, první část popisuju navržené metody pro rozpoznání emočního stavu z hraných databází. V rámci této části jsou představeny výsledky rozpoznání použitím dvou různých databází s různými jazyky. Hlavními přínosy této části je detailní analýza rozsáhlé škály různých příznaků získaných z řečového signálu, návrh nových klasifikačních architektur jako je například „emoční párování“ a návrh nové metody pro mapování diskrétních emočních stavů do dvou dimenzionálního prostoru. Druhá část se zabývá rozpoznáním emočních stavů z databáze spontánní řeči, která byla získána ze záznamů hovorů z reálných call center. Poznatky z analýzy a návrhu metod rozpoznání z hrané řeči byly využity pro návrh nového systému pro rozpoznání sedmi spontánních emočních stavů. Jádrem navrženého přístupu je komplexní klasifikační architektura založena na fúzi různých systémů. Práce se dále zabývá vlivem emočního stavu mluvčího na úspěšnosti rozpoznání pohlaví a návrhem systému pro automatickou detekci úspěšných hovorů v call centrech na základě analýzy parametrů dialogu mezi účastníky telefonních hovorů.Doctoral thesis deals with emotion recognition from speech signals. The thesis is divided into two main parts; the first part describes proposed approaches for emotion recognition using two different multilingual databases of acted emotional speech. The main contributions of this part are detailed analysis of a big set of acoustic features, new classification schemes for vocal emotion recognition such as “emotion coupling” and new method for mapping discrete emotions into two-dimensional space. The second part of this thesis is devoted to emotion recognition using multilingual databases of spontaneous emotional speech, which is based on telephone records obtained from real call centers. The knowledge gained from experiments with emotion recognition from acted speech was exploited to design a new approach for classifying seven emotional states. The core of the proposed approach is a complex classification architecture based on the fusion of different systems. The thesis also examines the influence of speaker’s emotional state on gender recognition performance and proposes system for automatic identification of successful phone calls in call center by means of dialogue features.

    Speaker Identification Based On Discriminative Vector Quantization And Data Fusion

    Get PDF
    Speaker Identification (SI) approaches based on discriminative Vector Quantization (VQ) and data fusion techniques are presented in this dissertation. The SI approaches based on Discriminative VQ (DVQ) proposed in this dissertation are the DVQ for SI (DVQSI), the DVQSI with Unique speech feature vector space segmentation for each speaker pair (DVQSI-U), and the Adaptive DVQSI (ADVQSI) methods. The difference of the probability distributions of the speech feature vector sets from various speakers (or speaker groups) is called the interspeaker variation between speakers (or speaker groups). The interspeaker variation is the measure of template differences between speakers (or speaker groups). All DVQ based techniques presented in this contribution take advantage of the interspeaker variation, which are not exploited in the previous proposed techniques by others that employ traditional VQ for SI (VQSI). All DVQ based techniques have two modes, the training mode and the testing mode. In the training mode, the speech feature vector space is first divided into a number of subspaces based on the interspeaker variations. Then, a discriminative weight is calculated for each subspace of each speaker or speaker pair in the SI group based on the interspeaker variation. The subspaces with higher interspeaker variations play more important roles in SI than the ones with lower interspeaker variations by assigning larger discriminative weights. In the testing mode, discriminative weighted average VQ distortions instead of equally weighted average VQ distortions are used to make the SI decision. The DVQ based techniques lead to higher SI accuracies than VQSI. DVQSI and DVQSI-U techniques consider the interspeaker variation for each speaker pair in the SI group. In DVQSI, speech feature vector space segmentations for all the speaker pairs are exactly the same. However, each speaker pair of DVQSI-U is treated individually in the speech feature vector space segmentation. In both DVQSI and DVQSI-U, the discriminative weights for each speaker pair are calculated by trial and error. The SI accuracies of DVQSI-U are higher than those of DVQSI at the price of much higher computational burden. ADVQSI explores the interspeaker variation between each speaker and all speakers in the SI group. In contrast with DVQSI and DVQSI-U, in ADVQSI, the feature vector space segmentation is for each speaker instead of each speaker pair based on the interspeaker variation between each speaker and all the speakers in the SI group. Also, adaptive techniques are used in the discriminative weights computation for each speaker in ADVQSI. The SI accuracies employing ADVQSI and DVQSI-U are comparable. However, the computational complexity of ADVQSI is much less than that of DVQSI-U. Also, a novel algorithm to convert the raw distortion outputs of template-based SI classifiers into compatible probability measures is proposed in this dissertation. After this conversion, data fusion techniques at the measurement level can be applied to SI. In the proposed technique, stochastic models of the distortion outputs are estimated. Then, the posteriori probabilities of the unknown utterance belonging to each speaker are calculated. Compatible probability measures are assigned based on the posteriori probabilities. The proposed technique leads to better SI performance at the measurement level than existing approaches

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    Multimodal person recognition for human-vehicle interaction

    Get PDF
    Next-generation vehicles will undoubtedly feature biometric person recognition as part of an effort to improve the driving experience. Today's technology prevents such systems from operating satisfactorily under adverse conditions. A proposed framework for achieving person recognition successfully combines different biometric modalities, borne out in two case studies
    corecore