734 research outputs found
VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
We present the first real-time method to capture the full global 3D skeletal
pose of a human in a stable, temporally consistent manner using a single RGB
camera. Our method combines a new convolutional neural network (CNN) based pose
regressor with kinematic skeleton fitting. Our novel fully-convolutional pose
formulation regresses 2D and 3D joint positions jointly in real time and does
not require tightly cropped input frames. A real-time kinematic skeleton
fitting method uses the CNN output to yield temporally stable 3D global pose
reconstructions on the basis of a coherent kinematic skeleton. This makes our
approach the first monocular RGB method usable in real-time applications such
as 3D character control---thus far, the only monocular methods for such
applications employed specialized RGB-D cameras. Our method's accuracy is
quantitatively on par with the best offline 3D monocular RGB pose estimation
methods. Our results are qualitatively comparable to, and sometimes better
than, results from monocular RGB-D approaches, such as the Kinect. However, we
show that our approach is more broadly applicable than RGB-D solutions, i.e. it
works for outdoor scenes, community videos, and low quality commodity RGB
cameras.Comment: Accepted to SIGGRAPH 201
3D human pose estimation from depth maps using a deep combination of poses
Many real-world applications require the estimation of human body joints for
higher-level tasks as, for example, human behaviour understanding. In recent
years, depth sensors have become a popular approach to obtain three-dimensional
information. The depth maps generated by these sensors provide information that
can be employed to disambiguate the poses observed in two-dimensional images.
This work addresses the problem of 3D human pose estimation from depth maps
employing a Deep Learning approach. We propose a model, named Deep Depth Pose
(DDP), which receives a depth map containing a person and a set of predefined
3D prototype poses and returns the 3D position of the body joints of the
person. In particular, DDP is defined as a ConvNet that computes the specific
weights needed to linearly combine the prototypes for the given input. We have
thoroughly evaluated DDP on the challenging 'ITOP' and 'UBC3V' datasets, which
respectively depict realistic and synthetic samples, defining a new
state-of-the-art on them.Comment: Accepted for publication at "Journal of Visual Communication and
Image Representation
Fine-grained sketch-based image retrieval by matching deformable part models
(c) 2014. The copyright of this document resides with its authors.
It may be distributed unchanged freely in print or electronic forms.© 2014. The copyright of this document resides with its authors. An important characteristic of sketches, compared with text, rests with their ability to intrinsically capture object appearance and structure. Nonetheless, akin to traditional text-based image retrieval, conventional sketch-based image retrieval (SBIR) principally focuses on retrieving images of the same category, neglecting the fine-grained characteristics of sketches. In this paper, we advocate the expressiveness of sketches and examine their efficacy under a novel fine-grained SBIR framework. In particular, we study how sketches enable fine-grained retrieval within object categories. Key to this problem is introducing a mid-level sketch representation that not only captures object pose, but also possesses the ability to traverse sketch and image domains. Specifically, we learn deformable part-based model (DPM) as a mid-level representation to discover and encode the various poses in sketch and image domains independently, after which graph matching is performed on DPMs to establish pose correspondences across the two domains. We further propose an SBIR dataset that covers the unique aspects of fine-grained SBIR. Through in-depth experiments, we demonstrate the superior performance of our SBIR framework, and showcase its unique ability in fine-grained retrieval
Single image 3D human pose estimation from noisy observations
Markerless 3D human pose detection from a single image is a severely underconstrained problem because different 3D poses can have similar image projections. In order to handle this ambiguity, current approaches rely on prior shape models that can only be correctly adjusted if 2D image features are accurately detected. Unfortunately, although current 2D part detector algorithms have shown promising results, they are not yet accurate enough to guarantee a complete disambiguation of the 3D inferred shape.
In this paper, we introduce a novel approach for estimating 3D human pose even when observations are noisy. We propose a stochastic sampling strategy to propagate
the noise from the image plane to the shape space. This provides a set of ambiguous 3D shapes, which are virtually undistinguishable from their image projections. Disambiguation is then achieved by imposing kinematic constraints that guarantee the resulting pose resembles a 3D
human shape. We validate the method on a variety of situations in which state-of-the-art 2D detectors yield either inaccurate estimations or partly miss some of the body parts.Preprin
Human Pose Estimation from Monocular Images : a Comprehensive Survey
Human pose estimation refers to the estimation of the location of body parts and how they are connected in an image. Human pose estimation from monocular images has wide applications (e.g., image indexing). Several surveys on human pose estimation can be found in the literature, but they focus on a certain category; for example, model-based approaches or human motion analysis, etc. As far as we know, an overall review of this problem domain has yet to be provided. Furthermore, recent advancements based on deep learning have brought novel algorithms for this problem. In this paper, a comprehensive survey of human pose estimation from monocular images is carried out including milestone works and recent advancements. Based on one standard pipeline for the solution of computer vision problems, this survey splits the problema into several modules: feature extraction and description, human body models, and modelin methods. Problem modeling methods are approached based on two means of categorization in this survey. One way to categorize includes top-down and bottom-up methods, and another way includes generative and discriminative methods. Considering the fact that one direct application of human pose estimation is to provide initialization for automatic video surveillance, there are additional sections for motion-related methods in all modules: motion features, motion models, and motion-based methods. Finally, the paper also collects 26 publicly available data sets for validation and provides error measurement methods that are frequently used
U4D: Unsupervised 4D Dynamic Scene Understanding
We introduce the first approach to solve the challenging problem of
unsupervised 4D visual scene understanding for complex dynamic scenes with
multiple interacting people from multi-view video. Our approach simultaneously
estimates a detailed model that includes a per-pixel semantically and
temporally coherent reconstruction, together with instance-level segmentation
exploiting photo-consistency, semantic and motion information. We further
leverage recent advances in 3D pose estimation to constrain the joint semantic
instance segmentation and 4D temporally coherent reconstruction. This enables
per person semantic instance segmentation of multiple interacting people in
complex dynamic scenes. Extensive evaluation of the joint visual scene
understanding framework against state-of-the-art methods on challenging indoor
and outdoor sequences demonstrates a significant (approx 40%) improvement in
semantic segmentation, reconstruction and scene flow accuracy.Comment: To appear in IEEE International Conference in Computer Vision ICCV
201
- …