Search CORE

8,209 research outputs found

3D Human Activity Recognition with Reconfigurable Convolutional Neural Networks

Author: Lin Liang
Wang Keze
Wang Meng
Wang Xiaolong
Zuo Wangmeng
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2015
Field of study

Human activity understanding with 3D/depth sensors has received increasing attention in multimedia processing and interactions. This work targets on developing a novel deep model for automatic activity recognition from RGB-D videos. We represent each human activity as an ensemble of cubic-like video segments, and learn to discover the temporal structures for a category of activities, i.e. how the activities to be decomposed in terms of classification. Our model can be regarded as a structured deep architecture, as it extends the convolutional neural networks (CNNs) by incorporating structure alternatives. Specifically, we build the network consisting of 3D convolutions and max-pooling operators over the video segments, and introduce the latent variables in each convolutional layer manipulating the activation of neurons. Our model thus advances existing approaches in two aspects: (i) it acts directly on the raw inputs (grayscale-depth data) to conduct recognition instead of relying on hand-crafted features, and (ii) the model structure can be dynamically adjusted accounting for the temporal variations of human activities, i.e. the network configuration is allowed to be partially activated during inference. For model training, we propose an EM-type optimization method that iteratively (i) discovers the latent structure by determining the decomposed actions for each training example, and (ii) learns the network parameters by using the back-propagation algorithm. Our approach is validated in challenging scenarios, and outperforms state-of-the-art methods. A large human activity database of RGB-D videos is presented in addition.Comment: This manuscript has 10 pages with 9 figures, and a preliminary version was published in ACM MM'14 conferenc

arXiv.org e-Print Archive

Crossref

ModDrop: adaptive multi-modal gesture recognition

Author: Nebout Florian
Neverova Natalia
Taylor Graham W.
Wolf Christian
Publication venue
Publication date: 06/06/2015
Field of study

We present a method for gesture detection and localisation based on multi-scale and multi-modal deep learning. Each visual modality captures spatial information at a particular spatial scale (such as motion of the upper body or a hand), and the whole system operates at three temporal scales. Key to our technique is a training strategy which exploits: i) careful initialization of individual modalities; and ii) gradual fusion involving random dropping of separate channels (dubbed ModDrop) for learning cross-modality correlations while preserving uniqueness of each modality-specific representation. We present experiments on the ChaLearn 2014 Looking at People Challenge gesture recognition track, in which we placed first out of 17 teams. Fusing multiple modalities at several spatial and temporal scales leads to a significant increase in recognition rates, allowing the model to compensate for errors of the individual classifiers as well as noise in the separate channels. Futhermore, the proposed ModDrop training technique ensures robustness of the classifier to missing signals in one or several channels to produce meaningful predictions from any number of available modalities. In addition, we demonstrate the applicability of the proposed fusion scheme to modalities of arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure

arXiv.org e-Print Archive

HAL

Hal-Diderot

Improving a 3-D Convolutional Neural Network Model Reinvented from VGG16 with Batch Normalization

Author: Malekmohamadi Hossein
Pattanajak Nontawat
Publication venue
Publication date: 26/04/2019
Field of study

It is challenging to build and train a Convolutional Neural Network model that can achieve a high accuracy rate for the first time. There are many variables to consider such as initial parameters, learning rate, and batch size. Unsuccessfully training a model is one of the most inevitable problems. In some cases, the model struggles to find a lower Loss Function value which results in a poor performance. Batch Normalization is considered as a remedy to overcome this problem. In this paper, two models reinvented from VGG16 are created with and without using Batch Normalization to evaluate their model performance. It is clear that the model using Batch Normalization provides a better result in terms of Loss Function value and model accuracy, which also achieves a very high accuracy rate. It also reaches the saturation point of the highest model accuracy faster than the model without Batch Normalization. This paper also finds that the accuracy of 3D Convolutional Neural Network model reinvented from VGG16 with Batch Normalization is at 91.2% which can beat many benchmarking results on UCF101 such as IDT [5], Two-Stream [10], and Dynamic Image Networks IDT [4]. The technique introduced in this paper shows a fast, reliable and accurate estimation of human activity type and could be used in smart environments

Crossref

De Montfort University Open Research Archive

Deep Neural Network Architectures for Modulation Classification

Author: Gamal Aly El
Liu Xiaoyu
Yang Diyu
Publication venue
Publication date: 05/01/2018
Field of study

In this work, we investigate the value of employing deep learning for the task of wireless signal modulation recognition. Recently in [1], a framework has been introduced by generating a dataset using GNU radio that mimics the imperfections in a real wireless channel, and uses 10 different modulation types. Further, a convolutional neural network (CNN) architecture was developed and shown to deliver performance that exceeds that of expert-based approaches. Here, we follow the framework of [1] and find deep neural network architectures that deliver higher accuracy than the state of the art. We tested the architecture of [1] and found it to achieve an accuracy of approximately 75% of correctly recognizing the modulation type. We first tune the CNN architecture of [1] and find a design with four convolutional layers and two dense layers that gives an accuracy of approximately 83.8% at high SNR. We then develop architectures based on the recently introduced ideas of Residual Networks (ResNet [2]) and Densely Connected Networks (DenseNet [3]) to achieve high SNR accuracies of approximately 83.5% and 86.6%, respectively. Finally, we introduce a Convolutional Long Short-term Deep Neural Network (CLDNN [4]) to achieve an accuracy of approximately 88.5% at high SNR.Comment: 5 pages, 10 figures, In proc. Asilomar Conference on Signals, Systems, and Computers, Nov. 201

arXiv.org e-Print Archive

Crossref

Large-scale Isolated Gesture Recognition Using Convolutional Neural Networks

Author: Gao Zhimin
Li Wanqing
Liu Song
Ogunbona Philip
Tang Chang
Wang Pichao
Publication venue
Publication date: 01/01/2016
Field of study

This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI). These dynamic images are constructed from a sequence of depth maps using bidirectional rank pooling to effectively capture the spatial-temporal information. Such image-based representations enable us to fine-tune the existing ConvNets models trained on image data for classification of depth sequences, without introducing large parameters to learn. Upon the proposed representations, a convolutional Neural networks (ConvNets) based method is developed for gesture recognition and evaluated on the Large-scale Isolated Gesture Recognition at the ChaLearn Looking at People (LAP) challenge 2016. The method achieved 55.57\% classification accuracy and ranked

2^{nd}

place in this challenge but was very close to the best performance even though we only used depth data.Comment: arXiv admin note: text overlap with arXiv:1608.0633

arXiv.org e-Print Archive

Crossref

Research Online