Search CORE

2,204 research outputs found

Eye-Tracking Signals Based Affective Classification Employing Deep Gradient Convolutional Neural Networks

Author: Deng Jiangang
Li Yuanfeng
Wang Ying
Wu Qun
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 09/05/2022
Field of study

Utilizing biomedical signals as a basis to calculate the human affective states is an essential issue of affective computing (AC). With the in-depth research on affective signals, the combination of multi-model cognition and physiological indicators, the establishment of a dynamic and complete database, and the addition of high-tech innovative products become recent trends in AC. This research aims to develop a deep gradient convolutional neural network (DGCNN) for classifying affection by using an eye-tracking signals. General signal process tools and pre-processing methods were applied firstly, such as Kalman filter, windowing with hamming, short-time Fourier transform (SIFT), and fast Fourier transform (FTT). Secondly, the eye-moving and tracking signals were converted into images. A convolutional neural networks-based training structure was subsequently applied; the experimental dataset was acquired by an eye-tracking device by assigning four affective stimuli (nervous, calm, happy, and sad) of 16 participants. Finally, the performance of DGCNN was compared with a decision tree (DT), Bayesian Gaussian model (BGM), and k-nearest neighbor (KNN) by using indices of true positive rate (TPR) and false negative rate (FPR). Customizing mini-batch, loss, learning rate, and gradients definition for the training structure of the deep neural network was also deployed finally. The predictive classification matrix showed the effectiveness of the proposed method for eye moving and tracking signals, which performs more than 87.2% inaccuracy. This research provided a feasible way to find more natural human-computer interaction through eye moving and tracking signals and has potential application on the affective production design process

Re-UNIR

An investigation into glottal waveform based speech coding

Author: Bleakley Christopher J.
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 01/01/1995
Field of study

Coding of voiced speech by extraction of the glottal waveform has shown promise in improving the efficiency of speech coding systems. This thesis describes an investigation into the performance of such a system. The effect of reverberation on the radiation impedance at the lips is shown to be negligible under normal conditions. Also, the accuracy of the Image Method for adding artificial reverberation to anechoic speech recordings is established. A new algorithm, Pre-emphasised Maximum Likelihood Epoch Detection (PMLED), for Glottal Closure Instant detection is proposed. The algorithm is tested on natural speech and is shown to be both accurate and robust. Two techniques for giottai waveform estimation, Closed Phase Inverse Filtering (CPIF) and Iterative Adaptive Inverse Filtering (IAIF), are compared. In tandem with an LF model fitting procedure, both techniques display a high degree of accuracy However, IAIF is found to be slightly more robust. Based on these results, a Glottal Excited Linear Predictive (GELP) coding system for voiced speech is proposed and tested. Using a differential LF parameter quantisation scheme, the system achieves speech quality similar to that of U S Federal Standard 1016 CELP at a lower mean bit rate while incurring no extra delay

DCU Online Research Access Service

A Hybrid Fuzzy Cognitive Map/Support Vector Machine Approach for EEG-Based Emotion Classification Using Compressed Sensing

Author: Candra H
Chai R
Guo K
Guo Y
Nguyen H
Song R
Su S
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/02/2019
Field of study

© 2018, Taiwan Fuzzy Systems Association and Springer-Verlag GmbH Germany, part of Springer Nature. Due to the high dimensional, non-stationary and non-linear properties of electroencephalogram (EEG), a significant portion of research on EEG analysis remains unknown. In this paper, a novel approach to EEG-based human emotion study is presented using Big Data methods with a hybrid classifier. An EEG dataset is firstly compressed using compressed sensing, then, wavelet transform features are extracted, and a hybrid Support Vector Machine (SVM) and Fuzzy Cognitive Map classifier is designed. The compressed data is only one-fourth of the original size, and the hybrid classifier has the average accuracy by 73.32%. Comparing to a single SVM classifier, the average accuracy is improved by 3.23%. These outcomes show that psychological signal can be compressed without the sparsity identity. The stable and high accuracy classification system demonstrates that EEG signal can detect human emotion, and the findings further prove the existence of the inter-relationship between various regions of the brain

OPUS - University of Technology Sydney

Speech Processing in Computer Vision Applications

Author: Waterworth Nicholas
Publication venue: ScholarWorks@UARK
Publication date: 01/05/2020
Field of study

Deep learning has been recently proven to be a viable asset in determining features in the field of Speech Analysis. Deep learning methods like Convolutional Neural Networks facilitate the expansion of specific feature information in waveforms, allowing networks to create more feature dense representations of data. Our work attempts to address the problem of re-creating a face given a speaker\u27s voice and speaker identification using deep learning methods. In this work, we first review the fundamental background in speech processing and its related applications. Then we introduce novel deep learning-based methods to speech feature analysis. Finally, we will present our deep learning approaches to speaker identification and speech to face synthesis. The presented method can convert a speaker audio sample to an image of their predicted face. This framework is composed of several chained together networks, each with an essential step in the conversion process. These include Audio embedding, encoding, and face generation networks, respectively. Our experiments show that certain features can map to the face and that with a speaker\u27s voice, DNNs can create their face and that a GUI could be used in conjunction to display a speaker recognition network\u27s data

ScholarWorks@UARK

UARK (University of Arkansas )

A robust speech enhancement method in noisy environments

Author: Abajaddi Nesrine
Elfahm Youssef
Farchi Abdelmajid
Mounir Badia
Publication venue: Faculty of Electrical Engineering, J.J. Strossmayer University of Osijek
Publication date: 01/01/2023
Field of study

Speech enhancement aims to eliminate or reduce undesirable noises and distortions, this processing should keep features of the speech to enhance the quality and intelligibility of degraded speech signals. In this study, we investigated a combined approach using single-frequency filtering (SFF) and a modified spectral subtraction method to enhance single-channel speech. The SFF method involves dividing the speech signal into uniform subband envelopes, and then performing spectral over-subtraction on each envelope. A smoothing parameter, determined by the a-posteriori signal-to-noise ratio (SNR), is used to estimate and update the noise without the need for explicitly detecting silence. To evaluate the performance of our algorithm, we employed objective measures such as segmental SNR (segSNR), extended short-term objective intelligibility (ESTOI), and perceptual evaluation of speech quality (PESQ). We tested our algorithm with various types of noise at different SNR levels and achieved results ranging from 4.24 to 15.41 for segSNR, 0.57 to 0.97 for ESTOI, and 2.18 to 4.45 for PESQ. Compared to other standard and existing speech enhancement methods, our algorithm produces better results and performs well in reducing undesirable noises

Hrčak - Portal of scientific journals of Croatia

Recommended from our members

Real-time speech emotion analysis for smart home assistants

Author: Chatterjee Rajdeep
Giri Debasis
Halder Rohit
Maitra Tanmoy
Mazumdar Saptarshi
Sherratt R. Simon
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Artificial Intelligence (AI) based Speech Emotion Recognition (SER) has been widely used in the consumer field for control of smart home personal assistants, with many such devices on the market. However, with the increase in computational power, connectivity and the need to enable people to live in the home for longer though the use of technology, then smart home assistants that could detect human emotion will improve the communication between a user and the assistant enabling the assistant of offer more productive feedback. Thus, the aim of this work is to analyze emotional states in speech and propose a suitable method considering performance verses complexity for deployment in Consumer Electronics home products, and to present a practical live demonstration of the research. In this paper, a comprehensive approach has been introduced for the human speech-based emotion analysis. The 1-D convolutional neural network (CNN) has been implemented to learn and classify the emotions associated with human speech. The paper has been implemented on the standard datasets (emotion classification) Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) and Toronto Emotional Speech Set database (TESS) (Young and Old). The proposed approach gives 90.48%, 95.79% and94.47% classification accuracies in the aforementioned datasets. We conclude that the 1-D CNN classification models used in speaker-independent experiments are highly effective in the automatic prediction of emotion and are ideal for deployment in smart home assistants to detect emotion

Central Archive at the University of Reading