13,066 research outputs found
Beyond Physical Connections: Tree Models in Human Pose Estimation
Simple tree models for articulated objects prevails in the last decade.
However, it is also believed that these simple tree models are not capable of
capturing large variations in many scenarios, such as human pose estimation.
This paper attempts to address three questions: 1) are simple tree models
sufficient? more specifically, 2) how to use tree models effectively in human
pose estimation? and 3) how shall we use combined parts together with single
parts efficiently?
Assuming we have a set of single parts and combined parts, and the goal is to
estimate a joint distribution of their locations. We surprisingly find that no
latent variables are introduced in the Leeds Sport Dataset (LSP) during
learning latent trees for deformable model, which aims at approximating the
joint distributions of body part locations using minimal tree structure. This
suggests one can straightforwardly use a mixed representation of single and
combined parts to approximate their joint distribution in a simple tree model.
As such, one only needs to build Visual Categories of the combined parts, and
then perform inference on the learned latent tree. Our method outperformed the
state of the art on the LSP, both in the scenarios when the training images are
from the same dataset and from the PARSE dataset. Experiments on animal images
from the VOC challenge further support our findings.Comment: CVPR 201
Sketch-based 3D Shape Retrieval using Convolutional Neural Networks
Retrieving 3D models from 2D human sketches has received considerable
attention in the areas of graphics, image retrieval, and computer vision.
Almost always in state of the art approaches a large amount of "best views" are
computed for 3D models, with the hope that the query sketch matches one of
these 2D projections of 3D models using predefined features.
We argue that this two stage approach (view selection -- matching) is
pragmatic but also problematic because the "best views" are subjective and
ambiguous, which makes the matching inputs obscure. This imprecise nature of
matching further makes it challenging to choose features manually. Instead of
relying on the elusive concept of "best views" and the hand-crafted features,
we propose to define our views using a minimalism approach and learn features
for both sketches and views. Specifically, we drastically reduce the number of
views to only two predefined directions for the whole dataset. Then, we learn
two Siamese Convolutional Neural Networks (CNNs), one for the views and one for
the sketches. The loss function is defined on the within-domain as well as the
cross-domain similarities. Our experiments on three benchmark datasets
demonstrate that our method is significantly better than state of the art
approaches, and outperforms them in all conventional metrics.Comment: CVPR 201
- …