Search CORE

358,079 research outputs found

Audio-visual speech recognition with background music using single-channel source separation

Author: Erdogan Hakan
Erdoğan Hakan
Grais Emad Mounir
Topkaya İbrahim Saygın
Topkaya Ibrahim Saygin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

In this paper, we consider audio-visual speech recognition with background music. The proposed algorithm is an integration of audio-visual speech recognition and single channel source separation (SCSS). We apply the proposed algorithm to recognize spoken speech that is mixed with music signals. First, the SCSS algorithm based on nonnegative matrix factorization (NMF) and spectral masks is used to separate the audio speech signal from the background music in magnitude spectral domain. After speech audio is separated from music, regular audio-visual speech recognition (AVSR) is employed using multi-stream hidden Markov models. Employing two approaches together, we try to improve recognition accuracy by both processing the audio signal with SCSS and supporting the recognition task with visual information. Experimental results show that combining audio-visual speech recognition with source separation gives remarkable improvements in the accuracy of the speech recognition system

CiteSeerX

Crossref

University of Surrey

Sabanci University Research Database

Surrey Research Insight

Acoustic and Device Feature Fusion for Load Recognition

Author: Gluhak Alexander
Imran Muhammad Ali
Nati Michele
Rajasegarar Sutharshan
Zoha Ahmed
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Appliance-specific Load Monitoring (LM) provides a possible solution to the problem of energy conservation which is becoming increasingly challenging, due to growing energy demands within offices and residential spaces. It is essential to perform automatic appliance recognition and monitoring for optimal resource utilization. In this paper, we study the use of non-intrusive LM methods that rely on steady-state appliance signatures for classifying most commonly used office appliances, while demonstrating their limitation in terms of accurately discerning the low-power devices due to overlapping load signatures. We propose a multilayer decision architecture that makes use of audio features derived from device sounds and fuse it with load signatures acquired from energy meter. For the recognition of device sounds, we perform feature set selection by evaluating the combination of time-domain and FFT-based audio features on the state of the art machine learning algorithms. The highest recognition performance however is shown by support vector machines, for the device and audio recognition experiments. Further, we demonstrate that our proposed feature set which is a concatenation of device audio feature and load signature significantly improves the device recognition accuracy in comparison to the use of steady-state load signatures only

Deakin Research Online

Crossref

University of Surrey

Enlighten

Surrey Research Insight

Proposing a hybrid approach for emotion classification using audio and video data

Author: Azimi Khojasteh Rezvan
Naji Alobaidi
Rafeh Reza
Publication venue: AIRCC Digital Library
Publication date: 30/11/2019
Field of study

Emotion recognition has been a research topic in the field of Human-Computer Interaction (HCI) during recent years. Computers have become an inseparable part of human life. Users need human-like interaction to better communicate with computers. Many researchers have become interested in emotion recognition and classification using different sources. A hybrid approach of audio and text has been recently introduced. All such approaches have been done to raise the accuracy and appropriateness of emotion classification. In this study, a hybrid approach of audio and video has been applied for emotion recognition. The innovation of this approach is selecting the characteristics of audio and video and their features as a unique specification for classification. In this research, the SVM method has been used for classifying the data in the SAVEE database. The experimental results show the maximum classification accuracy for audio data is 91.63% while by applying the hybrid approach the accuracy achieved is 99.26%

Crossref

Wintec Research Archive

Recommended from our members

Automatic affective dimension recognition from naturalistic facial expressions based on wavelet filtering and PLS regression

Author: Gaus YFBA
Jan A
Meng H
Turabzadeh S
Zhang F
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2015
Field of study

Automatic affective dimension recognition from facial expression continuously in naturalistic contexts is a very challenging research topic but very important in human-computer interaction. In this paper, an automatic recognition system was proposed to predict the affective dimensions such as Arousal, Valence and Dominance continuously in naturalistic facial expression videos. Firstly, visual and vocal features are extracted from image frames and audio segments in facial expression videos. Secondly, a wavelet transform based digital filtering method is applied to remove the irrelevant noise information in the feature space. Thirdly, Partial Least Squares regression is used to predict the affective dimensions from both video and audio modalities. Finally, two modalities are combined to boost overall performance in the decision fusion process. The proposed method is tested in the fourth international Audio/Visual Emotion Recognition Challenge (AVEC2014) dataset and compared to other state-of-the-art methods in the affect recognition sub-challenge with a good performance

Brunel University Research Archive