160,554 research outputs found
An Expressive Deep Model for Human Action Parsing from A Single Image
This paper aims at one newly raising task in vision and multimedia research:
recognizing human actions from still images. Its main challenges lie in the
large variations in human poses and appearances, as well as the lack of
temporal motion information. Addressing these problems, we propose to develop
an expressive deep model to naturally integrate human layout and surrounding
contexts for higher level action understanding from still images. In
particular, a Deep Belief Net is trained to fuse information from different
noisy sources such as body part detection and object detection. To bridge the
semantic gap, we used manually labeled data to greatly improve the
effectiveness and efficiency of the pre-training and fine-tuning stages of the
DBN training. The resulting framework is shown to be robust to sometimes
unreliable inputs (e.g., imprecise detections of human parts and objects), and
outperforms the state-of-the-art approaches.Comment: 6 pages, 8 figures, ICME 201
Anticipation in Human-Robot Cooperation: A Recurrent Neural Network Approach for Multiple Action Sequences Prediction
Close human-robot cooperation is a key enabler for new developments in
advanced manufacturing and assistive applications. Close cooperation require
robots that can predict human actions and intent, and understand human
non-verbal cues. Recent approaches based on neural networks have led to
encouraging results in the human action prediction problem both in continuous
and discrete spaces. Our approach extends the research in this direction. Our
contributions are three-fold. First, we validate the use of gaze and body pose
cues as a means of predicting human action through a feature selection method.
Next, we address two shortcomings of existing literature: predicting multiple
and variable-length action sequences. This is achieved by introducing an
encoder-decoder recurrent neural network topology in the discrete action
prediction problem. In addition, we theoretically demonstrate the importance of
predicting multiple action sequences as a means of estimating the stochastic
reward in a human robot cooperation scenario. Finally, we show the ability to
effectively train the prediction model on a action prediction dataset,
involving human motion data, and explore the influence of the model's
parameters on its performance. Source code repository:
https://github.com/pschydlo/ActionAnticipationComment: IEEE International Conference on Robotics and Automation (ICRA) 2018,
Accepte
Mining Mid-level Features for Action Recognition Based on Effective Skeleton Representation
Recently, mid-level features have shown promising performance in computer
vision. Mid-level features learned by incorporating class-level information are
potentially more discriminative than traditional low-level local features. In
this paper, an effective method is proposed to extract mid-level features from
Kinect skeletons for 3D human action recognition. Firstly, the orientations of
limbs connected by two skeleton joints are computed and each orientation is
encoded into one of the 27 states indicating the spatial relationship of the
joints. Secondly, limbs are combined into parts and the limb's states are
mapped into part states. Finally, frequent pattern mining is employed to mine
the most frequent and relevant (discriminative, representative and
non-redundant) states of parts in continuous several frames. These parts are
referred to as Frequent Local Parts or FLPs. The FLPs allow us to build
powerful bag-of-FLP-based action representation. This new representation yields
state-of-the-art results on MSR DailyActivity3D and MSR ActionPairs3D
- …