Search CORE

23 research outputs found

Video Based Emotion Recognition Using CNN and BRNN

Author: Devi K Dhivya
Nirmala M Dr.
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/12/2018
Field of study

Video-based Emotion recognition is rather challenging than vision task. It needs to model spatial information of each image frame as well as the temporal contextual correlations among sequential frames. For this purpose, we propose hierarchical deep network architecture to extract high-level spatial temporal features. Two classic neural networks, Convolutional neural network (CNN) and Bi-directional recurrent neural network (BRNN) are employed to capture facial textural characteristics in spatial domain and dynamic emotion changes in temporal domain. We endeavor to coordinate the two networks by optimizing each of them to boost the performance of the emotion recognition as well as to achieve greater accuracy as compared with baselines

International Journal on Future Revolution in Computer Science & Communication Engineering

Convolutional Neural Network Array for Sign Language Recognition using Wearable IMUs

Author: Gupta Rinki
Suri Karush
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/04/2020
Field of study

Advancements in gesture recognition algorithms have led to a significant growth in sign language translation. By making use of efficient intelligent models, signs can be recognized with precision. The proposed work presents a novel one-dimensional Convolutional Neural Network (CNN) array architecture for recognition of signs from the Indian sign language using signals recorded from a custom designed wearable IMU device. The IMU device makes use of tri-axial accelerometer and gyroscope. The signals recorded using the IMU device are segregated on the basis of their context, such as whether they correspond to signing for a general sentence or an interrogative sentence. The array comprises of two individual CNNs, one classifying the general sentences and the other classifying the interrogative sentence. Performances of individual CNNs in the array architecture are compared to that of a conventional CNN classifying the unsegregated dataset. Peak classification accuracies of 94.20% for general sentences and 95.00% for interrogative sentences achieved with the proposed CNN array in comparison to 93.50% for conventional CNN assert the suitability of the proposed approach.Comment: https://doi.org/10.1109/SPIN.2019.871174

arXiv.org e-Print Archive

Crossref

Beyond Short Snippets: Deep Networks for Video Classification

Author: Hausknecht Matthew
Monga Rajat
Ng Joe Yue-Hei
Toderici George
Vijayanarasimhan Sudheendra
Vinyals Oriol
Publication venue
Publication date: 13/04/2015
Field of study

Convolutional neural networks (CNNs) have been extensively applied for image recognition problems giving state-of-the-art results on recognition, detection, segmentation and retrieval. In this work we propose and evaluate several deep neural network architectures to combine image information across a video over longer time periods than previously attempted. We propose two methods capable of handling full length videos. The first method explores various convolutional temporal feature pooling architectures, examining the various design choices which need to be made when adapting a CNN for this task. The second proposed method explicitly models the video as an ordered sequence of frames. For this purpose we employ a recurrent neural network that uses Long Short-Term Memory (LSTM) cells which are connected to the output of the underlying CNN. Our best networks exhibit significant performance improvements over previously published results on the Sports 1 million dataset (73.1% vs. 60.9%) and the UCF-101 datasets with (88.6% vs. 88.0%) and without additional optical flow information (82.6% vs. 72.8%)

arXiv.org e-Print Archive

Crossref

Adaptation and contextualization of deep neural network models

Author: Kollias Dimitrios
Kollias Stefanos
Leontidis Georgios
Stafylopatis Andreas-Georgios
Tagaris Athanasios
Yu Miao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/11/2017
Field of study

The ability of Deep Neural Networks (DNNs) to provide very high accuracy in classification and recognition problems makes them the major tool for developments in such problems. It is, however, known that DNNs are currently used in a ‘black box’ manner, lacking transparency and interpretability of their decision-making process. Moreover, DNNs should use prior information on data classes, or object categories, so as to provide efficient classification of new data, or objects, without forgetting their previous knowledge. In this paper, we propose a novel class of systems that are able to adapt and contextualize the structure of trained DNNs, providing ways for handling the above-mentioned problems. A hierarchical and distributed system memory is generated and used for this purpose. The main memory is composed of the trained DNN architecture for classification/prediction, i.e., its structure and weights, as well as of an extracted - equivalent – Clustered Representation Set (CRS) generated by the DNN during training at its final - before the output – hidden layer. The latter includes centroids - ‘points of attraction’ - which link the extracted representation to a specific area in the existing system memory. Drift detection, occurring, for example, in personalized data analysis, can be accomplished by comparing the distances of new data from the centroids, taking into account the intra-cluster distances. Moreover, using the generated CRS, the system is able to contextualize its decision-making process, when new data become available. A new public medical database on Parkinson’s disease is used as testbed to illustrate the capabilities of the proposed architecture

University of Lincoln Institutional Repository

Multimodal Content Analysis for Effective Advertisements on YouTube

Author: Gupta Harsh
Johnson Joseph
Lee Hyunhwan
Ogihara Mitsunori
Parthasarathy Srinivasan
Ren Gang
Sun Wei
Vedula Nikhita
Publication venue
Publication date: 12/09/2017
Field of study

The rapid advances in e-commerce and Web 2.0 technologies have greatly increased the impact of commercial advertisements on the general public. As a key enabling technology, a multitude of recommender systems exists which analyzes user features and browsing patterns to recommend appealing advertisements to users. In this work, we seek to study the characteristics or attributes that characterize an effective advertisement and recommend a useful set of features to aid the designing and production processes of commercial advertisements. We analyze the temporal patterns from multimedia content of advertisement videos including auditory, visual and textual components, and study their individual roles and synergies in the success of an advertisement. The objective of this work is then to measure the effectiveness of an advertisement, and to recommend a useful set of features to advertisement designers to make it more successful and approachable to users. Our proposed framework employs the signal processing technique of cross modality feature learning where data streams from different components are employed to train separate neural network models and are then fused together to learn a shared representation. Subsequently, a neural network model trained on this joint feature embedding representation is utilized as a classifier to predict advertisement effectiveness. We validate our approach using subjective ratings from a dedicated user study, the sentiment strength of online viewer comments, and a viewer opinion metric of the ratio of the Likes and Views received by each advertisement from an online platform.Comment: 11 pages, 5 figures, ICDM 201

arXiv.org e-Print Archive

Crossref

Unsupervised Adversarial Domain Adaptation for Cross-Lingual Speech Emotion Recognition

Author: Bilal Muhammad
Latif Siddique
Qadir Junaid
Publication venue
Publication date: 27/07/2020
Field of study

Cross-lingual speech emotion recognition (SER) is a crucial task for many real-world applications. The performance of SER systems is often degraded by the differences in the distributions of training and test data. These differences become more apparent when training and test data belong to different languages, which cause a significant performance gap between the validation and test scores. It is imperative to build more robust models that can fit in practical applications of SER systems. Therefore, in this paper, we propose a Generative Adversarial Network (GAN)-based model for multilingual SER. Our choice of using GAN is motivated by their great success in learning the underlying data distribution. The proposed model is designed in such a way that can learn language invariant representations without requiring target-language data labels. We evaluate our proposed model on four different language emotional datasets, including an Urdu-language dataset to also incorporate alternative languages for which labelled data is difficult to find and which have not been studied much by the mainstream community. Our results show that our proposed model can significantly improve the baseline cross-lingual SER performance for all the considered datasets including the non-mainstream Urdu language data without requiring any labels.Comment: Accepted in Affective Computing & Intelligent Interaction (ACII 2019

arXiv.org e-Print Archive

Crossref