135 research outputs found
Deep Learning on Lie Groups for Skeleton-based Action Recognition
In recent years, skeleton-based action recognition has become a popular 3D
classification problem. State-of-the-art methods typically first represent each
motion sequence as a high-dimensional trajectory on a Lie group with an
additional dynamic time warping, and then shallowly learn favorable Lie group
features. In this paper we incorporate the Lie group structure into a deep
network architecture to learn more appropriate Lie group features for 3D action
recognition. Within the network structure, we design rotation mapping layers to
transform the input Lie group features into desirable ones, which are aligned
better in the temporal domain. To reduce the high feature dimensionality, the
architecture is equipped with rotation pooling layers for the elements on the
Lie group. Furthermore, we propose a logarithm mapping layer to map the
resulting manifold data into a tangent space that facilitates the application
of regular output layers for the final classification. Evaluations of the
proposed network for standard 3D human action recognition datasets clearly
demonstrate its superiority over existing shallow Lie group feature learning
methods as well as most conventional deep learning methods.Comment: Accepted to CVPR 201
Complex Human Action Recognition in Live Videos Using Hybrid FR-DL Method
Automated human action recognition is one of the most attractive and
practical research fields in computer vision, in spite of its high
computational costs. In such systems, the human action labelling is based on
the appearance and patterns of the motions in the video sequences; however, the
conventional methodologies and classic neural networks cannot use temporal
information for action recognition prediction in the upcoming frames in a video
sequence. On the other hand, the computational cost of the preprocessing stage
is high. In this paper, we address challenges of the preprocessing phase, by an
automated selection of representative frames among the input sequences.
Furthermore, we extract the key features of the representative frame rather
than the entire features. We propose a hybrid technique using background
subtraction and HOG, followed by application of a deep neural network and
skeletal modelling method. The combination of a CNN and the LSTM recursive
network is considered for feature selection and maintaining the previous
information, and finally, a Softmax-KNN classifier is used for labelling human
activities. We name our model as Feature Reduction & Deep Learning based action
recognition method, or FR-DL in short. To evaluate the proposed method, we use
the UCF dataset for the benchmarking which is widely-used among researchers in
action recognition research. The dataset includes 101 complicated activities in
the wild. Experimental results show a significant improvement in terms of
accuracy and speed in comparison with six state-of-the-art articles
Take an Emotion Walk: Perceiving Emotions from Gaits Using Hierarchical Attention Pooling and Affective Mapping
We present an autoencoder-based semi-supervised approach to classify
perceived human emotions from walking styles obtained from videos or
motion-captured data and represented as sequences of 3D poses. Given the motion
on each joint in the pose at each time step extracted from 3D pose sequences,
we hierarchically pool these joint motions in a bottom-up manner in the
encoder, following the kinematic chains in the human body. We also constrain
the latent embeddings of the encoder to contain the space of
psychologically-motivated affective features underlying the gaits. We train the
decoder to reconstruct the motions per joint per time step in a top-down manner
from the latent embeddings. For the annotated data, we also train a classifier
to map the latent embeddings to emotion labels. Our semi-supervised approach
achieves a mean average precision of 0.84 on the Emotion-Gait benchmark
dataset, which contains both labeled and unlabeled gaits collected from
multiple sources. We outperform current state-of-art algorithms for both
emotion recognition and action recognition from 3D gaits by 7%--23% on the
absolute. More importantly, we improve the average precision by 10%--50% on the
absolute on classes that each makes up less than 25% of the labeled part of the
Emotion-Gait benchmark dataset.Comment: In proceedings of the 16th European Conference on Computer Vision,
2020. Total pages 18. Total figures 5. Total tables
A review on Video Classification with Methods, Findings, Performance, Challenges, Limitations and Future Work
In recent years, there has been a rapid development in web users and sufficient bandwidth. Internet connectivity, which is so low cost, makes the sharing of information (text, audio, and videos) more common and faster. This video content needs to be analyzed for prediction it classes in different purpose for the users. Many machines learning approach has been developed for the classification of video to save people time and energy. There are a lot of existing review papers on video classification, but they have some limitations such as limitation of the analysis, badly structured, not mention research gaps or findings, not clearly describe advantages, disadvantages, and future work. But our review paper almost overcomes these limitations. This study attempts to review existing video-classification procedures and to examine the existing methods of video-classification comparatively and critically and to recommend the most effective and productive process. First of all, our analysis examines the classification of videos with taxonomical details, the latest application, process, and datasets information. Secondly, overall inconvenience, difficulties, shortcomings and potential work, data, performance measurements with the related recent relation in science, deep learning, and the model of machine learning. Study on video classification systems using their tools, benefits, drawbacks, as well as other features to compare the techniques they have used also constitutes a key task of this review. Lastly, we also present a quick summary table based on selected features. In terms of precision and independence extraction functions, the RNN (Recurrent Neural Network), CNN (Convolutional Neural Network) and combination approach performs better than the CNN dependent method
- …