12 research outputs found
Large-scale Isolated Gesture Recognition Using Convolutional Neural Networks
This paper proposes three simple, compact yet effective representations of
depth sequences, referred to respectively as Dynamic Depth Images (DDI),
Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images
(DDMNI). These dynamic images are constructed from a sequence of depth maps
using bidirectional rank pooling to effectively capture the spatial-temporal
information. Such image-based representations enable us to fine-tune the
existing ConvNets models trained on image data for classification of depth
sequences, without introducing large parameters to learn. Upon the proposed
representations, a convolutional Neural networks (ConvNets) based method is
developed for gesture recognition and evaluated on the Large-scale Isolated
Gesture Recognition at the ChaLearn Looking at People (LAP) challenge 2016. The
method achieved 55.57\% classification accuracy and ranked place in
this challenge but was very close to the best performance even though we only
used depth data.Comment: arXiv admin note: text overlap with arXiv:1608.0633
Large-scale Continuous Gesture Recognition Using Convolutional Neural Networks
This paper addresses the problem of continuous gesture recognition from
sequences of depth maps using convolutional neutral networks (ConvNets). The
proposed method first segments individual gestures from a depth sequence based
on quantity of movement (QOM). For each segmented gesture, an Improved Depth
Motion Map (IDMM), which converts the depth sequence into one image, is
constructed and fed to a ConvNet for recognition. The IDMM effectively encodes
both spatial and temporal information and allows the fine-tuning with existing
ConvNet models for classification without introducing millions of parameters to
learn. The proposed method is evaluated on the Large-scale Continuous Gesture
Recognition of the ChaLearn Looking at People (LAP) challenge 2016. It achieved
the performance of 0.2655 (Mean Jaccard Index) and ranked place in
this challenge
Facial image noise classification and denoising using neural network
Image denoising is an important aspect of image processing. Noisy images are produced as a result of technical and environmental flaws. As a result, it is reasonable to consider image denoising an important topic to research, as it also aids in the resolution of other image processing issues. The challenge, however, is that the traditional techniques used are time-consuming and inflexible. This article purposed a system of classifying and denoising noised images. A CNN and UNET based model architecture is designed, implement, and evaluated. The facial image dataset is processed and then it is used to train, valid and test the models. During preprocessing, the images are resized into 48*48, normalize, and various noises are added to the image. The preprocessing for each model is a bit different. The training and validation accuracy for the CNN model is 99.87% and 99.92% respectively. The UNET model is also able to get optimal PSNR and SSIM values for different noises
Non-verbal Facial Action Units-based Automatic Depression Classification
Depression is a common mental disorder that causes people to experience
depressed mood, loss of interest or pleasure, feelings of guilt or low
self-worth. Traditional clinical depression diagnosis methods are subjective
and time consuming. Since depression can be reflected by human facial
expressions, We propose a non-verbal facial behavior-based automatic depression
classification approach. In this paper, both short-term behavior-based and
clip-based depression classification are constructed. The final clip-level
decision of short-term behavior-based depression detection is yielded by
averaging the predictions of all short-term behaviors while we modelling
behaviors contained in all frames based on two Gaussian Mixture Models. To
evaluate the proposed approaches, we select a gender balanced subset from AVEC
2019 depression corpus containing 30 participants. The experimental results
show that our method achieved more than 75% depression classification accuracy,
where both GMM-based clip-level depression modelling and rank pooling-based
short-term depression behavior modelling achieved at least 70% classification
accuracy. The result indicates that our approach can leverage complementary
information from both systems to achieve promising depression predictions from
facial behaviors
A robust method for VR-based hand gesture recognition using density-based CNN
Many VR-based medical purposes applications have been developed to help patients with mobility decrease caused by accidents, diseases, or other injuries to do physical treatment efficiently. VR-based applications were considered more effective helper for individual physical treatment because of their low-cost equipment and flexibility in time and space, less assistance of a physical therapist. A challenge in developing a VR-based physical treatment was understanding the body part movement accurately and quickly. We proposed a robust pipeline to understanding hand motion accurately. We retrieved our data from movement sensors such as HTC vive and leap motion. Given a sequence position of palm, we represent our data as binary 2D images of gesture shape. Our dataset consisted of 14 kinds of hand gestures recommended by a physiotherapist. Given 33 3D points that were mapped into binary images as input, we trained our proposed density-based CNN. Our CNN model concerned with our input characteristics, having many 'blank block pixels', 'single-pixel thickness' shape and generated as a binary image. Pyramid kernel size applied on the feature extraction part and classification layer using softmax as loss function, have given 97.7% accuracy
Regional Attention with Architecture-Rebuilt 3D Network for RGB-D Gesture Recognition
Human gesture recognition has drawn much attention in the area of computer
vision. However, the performance of gesture recognition is always influenced by
some gesture-irrelevant factors like the background and the clothes of
performers. Therefore, focusing on the regions of hand/arm is important to the
gesture recognition. Meanwhile, a more adaptive architecture-searched network
structure can also perform better than the block-fixed ones like Resnet since
it increases the diversity of features in different stages of the network
better. In this paper, we propose a regional attention with
architecture-rebuilt 3D network (RAAR3DNet) for gesture recognition. We replace
the fixed Inception modules with the automatically rebuilt structure through
the network via Neural Architecture Search (NAS), owing to the different shape
and representation ability of features in the early, middle, and late stage of
the network. It enables the network to capture different levels of feature
representations at different layers more adaptively. Meanwhile, we also design
a stackable regional attention module called dynamic-static Attention (DSA),
which derives a Gaussian guidance heatmap and dynamic motion map to highlight
the hand/arm regions and the motion information in the spatial and temporal
domains, respectively. Extensive experiments on two recent large-scale RGB-D
gesture datasets validate the effectiveness of the proposed method and show it
outperforms state-of-the-art methods. The codes of our method are available at:
https://github.com/zhoubenjia/RAAR3DNet.Comment: Accepted by AAAI 202
Depth Pooling Based Large-scale 3D Action Recognition with Convolutional Neural Networks
This paper proposes three simple, compact yet effective representations of
depth sequences, referred to respectively as Dynamic Depth Images (DDI),
Dynamic Depth Normal Images (DDNI) and Dynamic Depth Motion Normal Images
(DDMNI), for both isolated and continuous action recognition. These dynamic
images are constructed from a segmented sequence of depth maps using
hierarchical bidirectional rank pooling to effectively capture the
spatial-temporal information. Specifically, DDI exploits the dynamics of
postures over time and DDNI and DDMNI exploit the 3D structural information
captured by depth maps. Upon the proposed representations, a ConvNet based
method is developed for action recognition. The image-based representations
enable us to fine-tune the existing Convolutional Neural Network (ConvNet)
models trained on image data without training a large number of parameters from
scratch. The proposed method achieved the state-of-art results on three large
datasets, namely, the Large-scale Continuous Gesture Recognition Dataset (means
Jaccard index 0.4109), the Large-scale Isolated Gesture Recognition Dataset
(59.21%), and the NTU RGB+D Dataset (87.08% cross-subject and 84.22%
cross-view) even though only the depth modality was used.Comment: arXiv admin note: text overlap with arXiv:1701.01814,
arXiv:1608.0633