710 research outputs found
Combining Local Appearance and Holistic View: Dual-Source Deep Neural Networks for Human Pose Estimation
We propose a new learning-based method for estimating 2D human pose from a
single image, using Dual-Source Deep Convolutional Neural Networks (DS-CNN).
Recently, many methods have been developed to estimate human pose by using pose
priors that are estimated from physiologically inspired graphical models or
learned from a holistic perspective. In this paper, we propose to integrate
both the local (body) part appearance and the holistic view of each local part
for more accurate human pose estimation. Specifically, the proposed DS-CNN
takes a set of image patches (category-independent object proposals for
training and multi-scale sliding windows for testing) as the input and then
learns the appearance of each local part by considering their holistic views in
the full body. Using DS-CNN, we achieve both joint detection, which determines
whether an image patch contains a body joint, and joint localization, which
finds the exact location of the joint in the image patch. Finally, we develop
an algorithm to combine these joint detection/localization results from all the
image patches for estimating the human pose. The experimental results show the
effectiveness of the proposed method by comparing to the state-of-the-art
human-pose estimation methods based on pose priors that are estimated from
physiologically inspired graphical models or learned from a holistic
perspective.Comment: CVPR 201
Co-interest Person Detection from Multiple Wearable Camera Videos
Wearable cameras, such as Google Glass and Go Pro, enable video data
collection over larger areas and from different views. In this paper, we tackle
a new problem of locating the co-interest person (CIP), i.e., the one who draws
attention from most camera wearers, from temporally synchronized videos taken
by multiple wearable cameras. Our basic idea is to exploit the motion patterns
of people and use them to correlate the persons across different videos,
instead of performing appearance-based matching as in traditional video
co-segmentation/localization. This way, we can identify CIP even if a group of
people with similar appearance are present in the view. More specifically, we
detect a set of persons on each frame as the candidates of the CIP and then
build a Conditional Random Field (CRF) model to select the one with consistent
motion patterns in different videos and high spacial-temporal consistency in
each video. We collect three sets of wearable-camera videos for testing the
proposed algorithm. All the involved people have similar appearances in the
collected videos and the experiments demonstrate the effectiveness of the
proposed algorithm.Comment: ICCV 201
- …