1,803 research outputs found
Graph Distillation for Action Detection with Privileged Modalities
We propose a technique that tackles action detection in multimodal videos
under a realistic and challenging condition in which only limited training data
and partially observed modalities are available. Common methods in transfer
learning do not take advantage of the extra modalities potentially available in
the source domain. On the other hand, previous work on multimodal learning only
focuses on a single domain or task and does not handle the modality discrepancy
between training and testing. In this work, we propose a method termed graph
distillation that incorporates rich privileged information from a large-scale
multimodal dataset in the source domain, and improves the learning in the
target domain where training data and modalities are scarce. We evaluate our
approach on action classification and detection tasks in multimodal videos, and
show that our model outperforms the state-of-the-art by a large margin on the
NTU RGB+D and PKU-MMD benchmarks. The code is released at
http://alan.vision/eccv18_graph/.Comment: ECCV 201
Multimodal Multipart Learning for Action Recognition in Depth Videos
The articulated and complex nature of human actions makes the task of action
recognition difficult. One approach to handle this complexity is dividing it to
the kinetics of body parts and analyzing the actions based on these partial
descriptors. We propose a joint sparse regression based learning method which
utilizes the structured sparsity to model each action as a combination of
multimodal features from a sparse set of body parts. To represent dynamics and
appearance of parts, we employ a heterogeneous set of depth and skeleton based
features. The proper structure of multimodal multipart features are formulated
into the learning framework via the proposed hierarchical mixed norm, to
regularize the structured features of each part and to apply sparsity between
them, in favor of a group feature selection. Our experimental results expose
the effectiveness of the proposed learning method in which it outperforms other
methods in all three tested datasets while saturating one of them by achieving
perfect accuracy
Simultaneous Feature and Body-Part Learning for Real-Time Robot Awareness of Human Behaviors
Robot awareness of human actions is an essential research problem in robotics
with many important real-world applications, including human-robot
collaboration and teaming. Over the past few years, depth sensors have become a
standard device widely used by intelligent robots for 3D perception, which can
also offer human skeletal data in 3D space. Several methods based on skeletal
data were designed to enable robot awareness of human actions with satisfactory
accuracy. However, previous methods treated all body parts and features equally
important, without the capability to identify discriminative body parts and
features. In this paper, we propose a novel simultaneous Feature And Body-part
Learning (FABL) approach that simultaneously identifies discriminative body
parts and features, and efficiently integrates all available information
together to enable real-time robot awareness of human behaviors. We formulate
FABL as a regression-like optimization problem with structured
sparsity-inducing norms to model interrelationships of body parts and features.
We also develop an optimization algorithm to solve the formulated problem,
which possesses a theoretical guarantee to find the optimal solution. To
evaluate FABL, three experiments were performed using public benchmark
datasets, including the MSR Action3D and CAD-60 datasets, as well as a Baxter
robot in practical assistive living applications. Experimental results show
that our FABL approach obtains a high recognition accuracy with a processing
speed of the order-of-magnitude of 10e4 Hz, which makes FABL a promising method
to enable real-time robot awareness of human behaviors in practical robotics
applications.Comment: 8 pages, 6 figures, accepted by ICRA'1
Skeleton-Based Human Action Recognition with Global Context-Aware Attention LSTM Networks
Human action recognition in 3D skeleton sequences has attracted a lot of
research attention. Recently, Long Short-Term Memory (LSTM) networks have shown
promising performance in this task due to their strengths in modeling the
dependencies and dynamics in sequential data. As not all skeletal joints are
informative for action recognition, and the irrelevant joints often bring noise
which can degrade the performance, we need to pay more attention to the
informative ones. However, the original LSTM network does not have explicit
attention ability. In this paper, we propose a new class of LSTM network,
Global Context-Aware Attention LSTM (GCA-LSTM), for skeleton based action
recognition. This network is capable of selectively focusing on the informative
joints in each frame of each skeleton sequence by using a global context memory
cell. To further improve the attention capability of our network, we also
introduce a recurrent attention mechanism, with which the attention performance
of the network can be enhanced progressively. Moreover, we propose a stepwise
training scheme in order to train our network effectively. Our approach
achieves state-of-the-art performance on five challenging benchmark datasets
for skeleton based action recognition
- …