2,204 research outputs found

    Eye-Tracking Signals Based Affective Classification Employing Deep Gradient Convolutional Neural Networks

    Get PDF
    Utilizing biomedical signals as a basis to calculate the human affective states is an essential issue of affective computing (AC). With the in-depth research on affective signals, the combination of multi-model cognition and physiological indicators, the establishment of a dynamic and complete database, and the addition of high-tech innovative products become recent trends in AC. This research aims to develop a deep gradient convolutional neural network (DGCNN) for classifying affection by using an eye-tracking signals. General signal process tools and pre-processing methods were applied firstly, such as Kalman filter, windowing with hamming, short-time Fourier transform (SIFT), and fast Fourier transform (FTT). Secondly, the eye-moving and tracking signals were converted into images. A convolutional neural networks-based training structure was subsequently applied; the experimental dataset was acquired by an eye-tracking device by assigning four affective stimuli (nervous, calm, happy, and sad) of 16 participants. Finally, the performance of DGCNN was compared with a decision tree (DT), Bayesian Gaussian model (BGM), and k-nearest neighbor (KNN) by using indices of true positive rate (TPR) and false negative rate (FPR). Customizing mini-batch, loss, learning rate, and gradients definition for the training structure of the deep neural network was also deployed finally. The predictive classification matrix showed the effectiveness of the proposed method for eye moving and tracking signals, which performs more than 87.2% inaccuracy. This research provided a feasible way to find more natural human-computer interaction through eye moving and tracking signals and has potential application on the affective production design process

    An investigation into glottal waveform based speech coding

    Get PDF
    Coding of voiced speech by extraction of the glottal waveform has shown promise in improving the efficiency of speech coding systems. This thesis describes an investigation into the performance of such a system. The effect of reverberation on the radiation impedance at the lips is shown to be negligible under normal conditions. Also, the accuracy of the Image Method for adding artificial reverberation to anechoic speech recordings is established. A new algorithm, Pre-emphasised Maximum Likelihood Epoch Detection (PMLED), for Glottal Closure Instant detection is proposed. The algorithm is tested on natural speech and is shown to be both accurate and robust. Two techniques for giottai waveform estimation, Closed Phase Inverse Filtering (CPIF) and Iterative Adaptive Inverse Filtering (IAIF), are compared. In tandem with an LF model fitting procedure, both techniques display a high degree of accuracy However, IAIF is found to be slightly more robust. Based on these results, a Glottal Excited Linear Predictive (GELP) coding system for voiced speech is proposed and tested. Using a differential LF parameter quantisation scheme, the system achieves speech quality similar to that of U S Federal Standard 1016 CELP at a lower mean bit rate while incurring no extra delay

    A Hybrid Fuzzy Cognitive Map/Support Vector Machine Approach for EEG-Based Emotion Classification Using Compressed Sensing

    Full text link
    © 2018, Taiwan Fuzzy Systems Association and Springer-Verlag GmbH Germany, part of Springer Nature. Due to the high dimensional, non-stationary and non-linear properties of electroencephalogram (EEG), a significant portion of research on EEG analysis remains unknown. In this paper, a novel approach to EEG-based human emotion study is presented using Big Data methods with a hybrid classifier. An EEG dataset is firstly compressed using compressed sensing, then, wavelet transform features are extracted, and a hybrid Support Vector Machine (SVM) and Fuzzy Cognitive Map classifier is designed. The compressed data is only one-fourth of the original size, and the hybrid classifier has the average accuracy by 73.32%. Comparing to a single SVM classifier, the average accuracy is improved by 3.23%. These outcomes show that psychological signal can be compressed without the sparsity identity. The stable and high accuracy classification system demonstrates that EEG signal can detect human emotion, and the findings further prove the existence of the inter-relationship between various regions of the brain

    Speech Processing in Computer Vision Applications

    Get PDF
    Deep learning has been recently proven to be a viable asset in determining features in the field of Speech Analysis. Deep learning methods like Convolutional Neural Networks facilitate the expansion of specific feature information in waveforms, allowing networks to create more feature dense representations of data. Our work attempts to address the problem of re-creating a face given a speaker\u27s voice and speaker identification using deep learning methods. In this work, we first review the fundamental background in speech processing and its related applications. Then we introduce novel deep learning-based methods to speech feature analysis. Finally, we will present our deep learning approaches to speaker identification and speech to face synthesis. The presented method can convert a speaker audio sample to an image of their predicted face. This framework is composed of several chained together networks, each with an essential step in the conversion process. These include Audio embedding, encoding, and face generation networks, respectively. Our experiments show that certain features can map to the face and that with a speaker\u27s voice, DNNs can create their face and that a GUI could be used in conjunction to display a speaker recognition network\u27s data

    A robust speech enhancement method in noisy environments

    Get PDF
    Speech enhancement aims to eliminate or reduce undesirable noises and distortions, this processing should keep features of the speech to enhance the quality and intelligibility of degraded speech signals. In this study, we investigated a combined approach using single-frequency filtering (SFF) and a modified spectral subtraction method to enhance single-channel speech. The SFF method involves dividing the speech signal into uniform subband envelopes, and then performing spectral over-subtraction on each envelope. A smoothing parameter, determined by the a-posteriori signal-to-noise ratio (SNR), is used to estimate and update the noise without the need for explicitly detecting silence. To evaluate the performance of our algorithm, we employed objective measures such as segmental SNR (segSNR), extended short-term objective intelligibility (ESTOI), and perceptual evaluation of speech quality (PESQ). We tested our algorithm with various types of noise at different SNR levels and achieved results ranging from 4.24 to 15.41 for segSNR, 0.57 to 0.97 for ESTOI, and 2.18 to 4.45 for PESQ. Compared to other standard and existing speech enhancement methods, our algorithm produces better results and performs well in reducing undesirable noises
    corecore