3,103 research outputs found

    Emotion Recognition from Acted and Spontaneous Speech

    Get PDF
    Dizertační práce se zabývá rozpoznáním emočního stavu mluvčích z řečového signálu. Práce je rozdělena do dvou hlavních častí, první část popisuju navržené metody pro rozpoznání emočního stavu z hraných databází. V rámci této části jsou představeny výsledky rozpoznání použitím dvou různých databází s různými jazyky. Hlavními přínosy této části je detailní analýza rozsáhlé škály různých příznaků získaných z řečového signálu, návrh nových klasifikačních architektur jako je například „emoční párování“ a návrh nové metody pro mapování diskrétních emočních stavů do dvou dimenzionálního prostoru. Druhá část se zabývá rozpoznáním emočních stavů z databáze spontánní řeči, která byla získána ze záznamů hovorů z reálných call center. Poznatky z analýzy a návrhu metod rozpoznání z hrané řeči byly využity pro návrh nového systému pro rozpoznání sedmi spontánních emočních stavů. Jádrem navrženého přístupu je komplexní klasifikační architektura založena na fúzi různých systémů. Práce se dále zabývá vlivem emočního stavu mluvčího na úspěšnosti rozpoznání pohlaví a návrhem systému pro automatickou detekci úspěšných hovorů v call centrech na základě analýzy parametrů dialogu mezi účastníky telefonních hovorů.Doctoral thesis deals with emotion recognition from speech signals. The thesis is divided into two main parts; the first part describes proposed approaches for emotion recognition using two different multilingual databases of acted emotional speech. The main contributions of this part are detailed analysis of a big set of acoustic features, new classification schemes for vocal emotion recognition such as “emotion coupling” and new method for mapping discrete emotions into two-dimensional space. The second part of this thesis is devoted to emotion recognition using multilingual databases of spontaneous emotional speech, which is based on telephone records obtained from real call centers. The knowledge gained from experiments with emotion recognition from acted speech was exploited to design a new approach for classifying seven emotional states. The core of the proposed approach is a complex classification architecture based on the fusion of different systems. The thesis also examines the influence of speaker’s emotional state on gender recognition performance and proposes system for automatic identification of successful phone calls in call center by means of dialogue features.

    Signal Processing and Restoration

    Get PDF

    Effect of Prolonged Non-Traumatic Noise Exposure on Unvoiced Speech Recognition

    Get PDF
    Animal models in the past decade have shown that noise exposure may affect temporal envelope processing at supra-threshold levels while the absolute hearing threshold remains in the normal range. However, human studies have failed to consistently find such issue due to poor control of the participants’ noise exposure history and the measure sensitivity. The current study operationally defined non-traumatic noise exposure (NTNE) to be noise exposure at dental schools because of its distinctive high-pass spectral feature, non-traumatic nature, and systematic exposure schedule across dental students of different years. Temporal envelope processing was examined through unvoiced speech recognition interrupted by noise or by silence. The results showed that people who had systematic exposure to dental noise performed more poorly on tasks of temporal envelope processing than the exposed people. The effect of high-frequency NTNE on temporal envelope processing was more robust inside than outside the spectral band of dental noise and was more obvious in conditions that required finer temporal resolution (e.g faster noise modulation rate) than in those requiring less fine temporal resolution (e.g. slower noise modulation rate). Furthermore, there was a significant performance difference between the exposed and the unexposed groups on tasks of spectral envelope processing at low frequency. Meanwhile, the two groups performed similarly in tasks near threshold. Additional analyses showed that factors such as age, years of musical training, non-dental noise exposure history and peripheral auditory function were not able to explain the variance of the performance in tasks of temporal or spectral envelope processing. The findings from the current study support the general assumptions from animal models of NTNE that temporal and spectral envelope processing issues related to NTNE likely occur in retro-cochlear sites, at supra-threshold levels, and could be easily overlooked by clinically routine audiologic screening

    Effects of spatial separation on across-frequency grouping in narrowband speech

    Full text link
    Thesis (M.S.)--Boston UniversityUnderstanding how we perceive speech in the face of competing sound sources coming from a variety of directions is an important goal in psychoacoustics. In everyday situations, noisy interference can obscure the content of a conversation and require listeners to integrate speech information across different frequency regions. Two studies will be explained that investigate the effects of spatial separation on the grouping of two spectrally separated, narrow bands of target speech with a variety of filler stimuli centered in between these bands. Target sentences taken from the IEEE corpus were broken into two 3/4-octave bands with the lowest centered around 370 Hz and the highest centered around 6kHz. The first study explored the spatial influences of spectral restoration. The primary experiment measured speech intelligibility of the speech bands (presented diotically) with a single band of noise between 700 Hz and 3 kHz used as the filler and then with the same noise band modulated by the target speech envelope as the filler. These fillers were presented diotically as well as with an ITD of 600 s leading to the left ear. Performance was worse for the unmodulated noise condition when the filler was separated spatially from the speech bands. Across-frequency grouping was not observed with the modulated noise conditions. The second study explored the effect of attention on intelligibility of speech bands presented from the left with related fillers. The filler objects used in this study were dual bands of vocoded or narrowband speech presented either from left or right. The fillers were derived from either the same target speech token (matched) or an independent sentence (conflicting). In a key experimental block, listeners were instructed to attend to the target speech on the left while either conflicting bands or, infrequently, matched bands were presented on the right. The infrequently presented matching trials were physically identical to trials in another block where listeners were instructed to attend to both ears. Results showed that splitting the target and filler across the ears degraded intelligibility, however, directed spatial attention had no effect on performance. These results demonstrate that speech elements group together strongly, overcoming spatial attention, even for degraded speech

    Convolutional Deblurring for Natural Imaging

    Full text link
    In this paper, we propose a novel design of image deblurring in the form of one-shot convolution filtering that can directly convolve with naturally blurred images for restoration. The problem of optical blurring is a common disadvantage to many imaging applications that suffer from optical imperfections. Despite numerous deconvolution methods that blindly estimate blurring in either inclusive or exclusive forms, they are practically challenging due to high computational cost and low image reconstruction quality. Both conditions of high accuracy and high speed are prerequisites for high-throughput imaging platforms in digital archiving. In such platforms, deblurring is required after image acquisition before being stored, previewed, or processed for high-level interpretation. Therefore, on-the-fly correction of such images is important to avoid possible time delays, mitigate computational expenses, and increase image perception quality. We bridge this gap by synthesizing a deconvolution kernel as a linear combination of Finite Impulse Response (FIR) even-derivative filters that can be directly convolved with blurry input images to boost the frequency fall-off of the Point Spread Function (PSF) associated with the optical blur. We employ a Gaussian low-pass filter to decouple the image denoising problem for image edge deblurring. Furthermore, we propose a blind approach to estimate the PSF statistics for two Gaussian and Laplacian models that are common in many imaging pipelines. Thorough experiments are designed to test and validate the efficiency of the proposed method using 2054 naturally blurred images across six imaging applications and seven state-of-the-art deconvolution methods.Comment: 15 pages, for publication in IEEE Transaction Image Processin
    corecore