40,087 research outputs found
An Expressive Deep Model for Human Action Parsing from A Single Image
This paper aims at one newly raising task in vision and multimedia research:
recognizing human actions from still images. Its main challenges lie in the
large variations in human poses and appearances, as well as the lack of
temporal motion information. Addressing these problems, we propose to develop
an expressive deep model to naturally integrate human layout and surrounding
contexts for higher level action understanding from still images. In
particular, a Deep Belief Net is trained to fuse information from different
noisy sources such as body part detection and object detection. To bridge the
semantic gap, we used manually labeled data to greatly improve the
effectiveness and efficiency of the pre-training and fine-tuning stages of the
DBN training. The resulting framework is shown to be robust to sometimes
unreliable inputs (e.g., imprecise detections of human parts and objects), and
outperforms the state-of-the-art approaches.Comment: 6 pages, 8 figures, ICME 201
Contextual Attention for Hand Detection in the Wild
We present Hand-CNN, a novel convolutional network architecture for detecting hand masks and predicting hand orientations in unconstrained images. Hand-CNN extends MaskRCNN with a novel attention mechanism to incorporate contextual cues in the detection process. This attention mechanism can be implemented as an efficient network module that captures non-local dependencies between features. This network module can be inserted at different stages of an object detection network, and the entire detector can be trained end-to-end. We also introduce large-scale annotated hand datasets containing hands in unconstrained images for training and evaluation. We show that Hand-CNN outperforms existing methods on the newly collected datasets and the publicly available PASCAL VOC human layout dataset. Data and code: https://www3.cs.stonybrook.edu/~cvl/projects/hand_det_attention
Contextual Attention for Hand Detection in the Wild
We present Hand-CNN, a novel convolutional network architecture for detecting
hand masks and predicting hand orientations in unconstrained images. Hand-CNN
extends MaskRCNN with a novel attention mechanism to incorporate contextual
cues in the detection process. This attention mechanism can be implemented as
an efficient network module that captures non-local dependencies between
features. This network module can be inserted at different stages of an object
detection network, and the entire detector can be trained end-to-end.
We also introduce a large-scale annotated hand dataset containing hands in
unconstrained images for training and evaluation. We show that Hand-CNN
outperforms existing methods on several datasets, including our hand detection
benchmark and the publicly available PASCAL VOC human layout challenge. We also
conduct ablation studies on hand detection to show the effectiveness of the
proposed contextual attention module.Comment: 9 pages, 9 figure
- …