3 research outputs found

    Stress recognition from speech signal

    Get PDF
    Předložená disertační práce se zabývá vývojem algoritmů pro detekci stresu z řečového signálu. Inovativnost této práce se vyznačuje dvěma typy analýzy řečového signálu, a to za použití samohláskových polygonů a analýzy hlasivkových pulsů. Obě tyto základní analýzy mohou sloužit k detekci stresu v řečovém signálu, což bylo dokázáno sérií provedených experimentů. Nejlepších výsledků bylo dosaženo pomocí tzv. Closing-To-Opening phase ratio příznaku v Top-To-Bottom kritériu v kombinaci s vhodným klasifikátorem. Detekce stresu založená na této analýze může být definována jako jazykově i fonémově nezávislá, což bylo rovněž dokázáno získanými výsledky, které dosahují v některých případech až 95% úspěšnosti. Všechny experimenty byly provedeny na vytvořené české databázi obsahující reálný stres, a některé experimenty byly také provedeny pro anglickou stresovou databázi SUSAS.Presented doctoral thesis is focused on development of algorithms for psychological stress detection in speech signal. The novelty of this thesis aims on two different analysis of the speech signal- the analysis of vowel polygons and the analysis of glottal pulses. By performed experiments, the doctoral thesis uncovers the possible usage of both fundamental analyses for psychological stress detection in speech. The analysis of glottal pulses in amplitude domain according to Top-To-Bottom criterion seems to be as the most effective with the combination of properly chosen classifier, which can be defined as language and phoneme independent way to stress recognition. All experiments were performed on developed Czech real stress database and some observations were also made on English database SUSAS. The variety of possibly effective ways of stress recognition in speech leads to approach very high recognition accuracy of their combination, or of their possible usage for detection of other speaker’s state, which has to be further tested and verified by appropriate databases.

    Iterative feature normalization for emotional speech detection

    No full text
    Contending with signal variability due to source and channel effects is a critical problem in automatic emotion recognition. Any ap-proach in mitigating these effects however has to be done so as to not compromise emotion-relevant information in the signal. A promis-ing approach to this problem has been through feature normalization using features drawn from non-emotional (“neutral”) speech sam-ples. This paper considers a scheme for minimizing the inter-speaker differences while still preserving the emotional discrimination of the acoustic features. This can be achieved by estimating the normal-ization parameters using only neutral speech, and then applying the coefficients to the entire corpus (including emotional set). Specifi-cally, this paper introduces a feature normalization scheme that im-plements these ideas by iteratively detecting neutral speech and nor-malizing the features. As the approximation error of the normaliza-tion parameters is reduced, the accuracy of the emotion detection system increases. The accuracy of the proposed iterative approach, evaluated across three databases, is only 2.5 % lower than the one trained with optimal normalization parameters, and 9.7 % higher than the one trained without any normalization scheme. Index Terms — emotion recognition, fundamental frequency, emotions, feature normalization 1
    corecore