12 research outputs found

    Large-scale Isolated Gesture Recognition Using Convolutional Neural Networks

    Full text link
    This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI). These dynamic images are constructed from a sequence of depth maps using bidirectional rank pooling to effectively capture the spatial-temporal information. Such image-based representations enable us to fine-tune the existing ConvNets models trained on image data for classification of depth sequences, without introducing large parameters to learn. Upon the proposed representations, a convolutional Neural networks (ConvNets) based method is developed for gesture recognition and evaluated on the Large-scale Isolated Gesture Recognition at the ChaLearn Looking at People (LAP) challenge 2016. The method achieved 55.57\% classification accuracy and ranked 2nd2^{nd} place in this challenge but was very close to the best performance even though we only used depth data.Comment: arXiv admin note: text overlap with arXiv:1608.0633

    Large-scale Continuous Gesture Recognition Using Convolutional Neural Networks

    Full text link
    This paper addresses the problem of continuous gesture recognition from sequences of depth maps using convolutional neutral networks (ConvNets). The proposed method first segments individual gestures from a depth sequence based on quantity of movement (QOM). For each segmented gesture, an Improved Depth Motion Map (IDMM), which converts the depth sequence into one image, is constructed and fed to a ConvNet for recognition. The IDMM effectively encodes both spatial and temporal information and allows the fine-tuning with existing ConvNet models for classification without introducing millions of parameters to learn. The proposed method is evaluated on the Large-scale Continuous Gesture Recognition of the ChaLearn Looking at People (LAP) challenge 2016. It achieved the performance of 0.2655 (Mean Jaccard Index) and ranked 3rd3^{rd} place in this challenge

    Facial image noise classification and denoising using neural network

    Get PDF
    Image denoising is an important aspect of image processing. Noisy images are produced as a result of technical and environmental flaws. As a result, it is reasonable to consider image denoising an important topic to research, as it also aids in the resolution of other image processing issues. The challenge, however, is that the traditional techniques used are time-consuming and inflexible. This article purposed a system of classifying and denoising noised images. A CNN and UNET based model architecture is designed, implement, and evaluated. The facial image dataset is processed and then it is used to train, valid and test the models. During preprocessing, the images are resized into 48*48, normalize, and various noises are added to the image. The preprocessing for each model is a bit different. The training and validation accuracy for the CNN model is 99.87% and 99.92% respectively. The UNET model is also able to get optimal PSNR and SSIM values for different noises

    Non-verbal Facial Action Units-based Automatic Depression Classification

    Full text link
    Depression is a common mental disorder that causes people to experience depressed mood, loss of interest or pleasure, feelings of guilt or low self-worth. Traditional clinical depression diagnosis methods are subjective and time consuming. Since depression can be reflected by human facial expressions, We propose a non-verbal facial behavior-based automatic depression classification approach. In this paper, both short-term behavior-based and clip-based depression classification are constructed. The final clip-level decision of short-term behavior-based depression detection is yielded by averaging the predictions of all short-term behaviors while we modelling behaviors contained in all frames based on two Gaussian Mixture Models. To evaluate the proposed approaches, we select a gender balanced subset from AVEC 2019 depression corpus containing 30 participants. The experimental results show that our method achieved more than 75% depression classification accuracy, where both GMM-based clip-level depression modelling and rank pooling-based short-term depression behavior modelling achieved at least 70% classification accuracy. The result indicates that our approach can leverage complementary information from both systems to achieve promising depression predictions from facial behaviors

    A robust method for VR-based hand gesture recognition using density-based CNN

    Get PDF
    Many VR-based medical purposes applications have been developed to help patients with mobility decrease caused by accidents, diseases, or other injuries to do physical treatment efficiently. VR-based applications were considered more effective helper for individual physical treatment because of their low-cost equipment and flexibility in time and space, less assistance of a physical therapist. A challenge in developing a VR-based physical treatment was understanding the body part movement accurately and quickly. We proposed a robust pipeline to understanding hand motion accurately. We retrieved our data from movement sensors such as HTC vive and leap motion. Given a sequence position of palm, we represent our data as binary 2D images of gesture shape. Our dataset consisted of 14 kinds of hand gestures recommended by a physiotherapist. Given 33 3D points that were mapped into binary images as input, we trained our proposed density-based CNN. Our CNN model concerned with our input characteristics, having many 'blank block pixels', 'single-pixel thickness' shape and generated as a binary image. Pyramid kernel size applied on the feature extraction part and classification layer using softmax as loss function, have given 97.7% accuracy

    Regional Attention with Architecture-Rebuilt 3D Network for RGB-D Gesture Recognition

    Full text link
    Human gesture recognition has drawn much attention in the area of computer vision. However, the performance of gesture recognition is always influenced by some gesture-irrelevant factors like the background and the clothes of performers. Therefore, focusing on the regions of hand/arm is important to the gesture recognition. Meanwhile, a more adaptive architecture-searched network structure can also perform better than the block-fixed ones like Resnet since it increases the diversity of features in different stages of the network better. In this paper, we propose a regional attention with architecture-rebuilt 3D network (RAAR3DNet) for gesture recognition. We replace the fixed Inception modules with the automatically rebuilt structure through the network via Neural Architecture Search (NAS), owing to the different shape and representation ability of features in the early, middle, and late stage of the network. It enables the network to capture different levels of feature representations at different layers more adaptively. Meanwhile, we also design a stackable regional attention module called dynamic-static Attention (DSA), which derives a Gaussian guidance heatmap and dynamic motion map to highlight the hand/arm regions and the motion information in the spatial and temporal domains, respectively. Extensive experiments on two recent large-scale RGB-D gesture datasets validate the effectiveness of the proposed method and show it outperforms state-of-the-art methods. The codes of our method are available at: https://github.com/zhoubenjia/RAAR3DNet.Comment: Accepted by AAAI 202

    Depth Pooling Based Large-scale 3D Action Recognition with Convolutional Neural Networks

    Get PDF
    This paper proposes three simple, compact yet effective representations of depth sequences, referred to respectively as Dynamic Depth Images (DDI), Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images (DDMNI), for both isolated and continuous action recognition. These dynamic images are constructed from a segmented sequence of depth maps using hierarchical bidirectional rank pooling to effectively capture the spatial-temporal information. Specifically, DDI exploits the dynamics of postures over time and DDNI and DDMNI exploit the 3D structural information captured by depth maps. Upon the proposed representations, a ConvNet based method is developed for action recognition. The image-based representations enable us to fine-tune the existing Convolutional Neural Network (ConvNet) models trained on image data without training a large number of parameters from scratch. The proposed method achieved the state-of-art results on three large datasets, namely, the Large-scale Continuous Gesture Recognition Dataset (means Jaccard index 0.4109), the Large-scale Isolated Gesture Recognition Dataset (59.21%), and the NTU RGB+D Dataset (87.08% cross-subject and 84.22% cross-view) even though only the depth modality was used.Comment: arXiv admin note: text overlap with arXiv:1701.01814, arXiv:1608.0633