4,184 research outputs found
WarpNet: Weakly Supervised Matching for Single-view Reconstruction
We present an approach to matching images of objects in fine-grained datasets
without using part annotations, with an application to the challenging problem
of weakly supervised single-view reconstruction. This is in contrast to prior
works that require part annotations, since matching objects across class and
pose variations is challenging with appearance features alone. We overcome this
challenge through a novel deep learning architecture, WarpNet, that aligns an
object in one image with a different object in another. We exploit the
structure of the fine-grained dataset to create artificial data for training
this network in an unsupervised-discriminative learning approach. The output of
the network acts as a spatial prior that allows generalization at test time to
match real images across variations in appearance, viewpoint and articulation.
On the CUB-200-2011 dataset of bird categories, we improve the AP over an
appearance-only network by 13.6%. We further demonstrate that our WarpNet
matches, together with the structure of fine-grained datasets, allow
single-view reconstructions with quality comparable to using annotated point
correspondences.Comment: to appear in IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 201
3D Object Class Detection in the Wild
Object class detection has been a synonym for 2D bounding box localization
for the longest time, fueled by the success of powerful statistical learning
techniques, combined with robust image representations. Only recently, there
has been a growing interest in revisiting the promise of computer vision from
the early days: to precisely delineate the contents of a visual scene, object
by object, in 3D. In this paper, we draw from recent advances in object
detection and 2D-3D object lifting in order to design an object class detector
that is particularly tailored towards 3D object class detection. Our 3D object
class detection method consists of several stages gradually enriching the
object detection output with object viewpoint, keypoints and 3D shape
estimates. Following careful design, in each stage it constantly improves the
performance and achieves state-ofthe-art performance in simultaneous 2D
bounding box and viewpoint estimation on the challenging Pascal3D+ dataset
Joint Object and Part Segmentation using Deep Learned Potentials
Segmenting semantic objects from images and parsing them into their
respective semantic parts are fundamental steps towards detailed object
understanding in computer vision. In this paper, we propose a joint solution
that tackles semantic object and part segmentation simultaneously, in which
higher object-level context is provided to guide part segmentation, and more
detailed part-level localization is utilized to refine object segmentation.
Specifically, we first introduce the concept of semantic compositional parts
(SCP) in which similar semantic parts are grouped and shared among different
objects. A two-channel fully convolutional network (FCN) is then trained to
provide the SCP and object potentials at each pixel. At the same time, a
compact set of segments can also be obtained from the SCP predictions of the
network. Given the potentials and the generated segments, in order to explore
long-range context, we finally construct an efficient fully connected
conditional random field (FCRF) to jointly predict the final object and part
labels. Extensive evaluation on three different datasets shows that our
approach can mutually enhance the performance of object and part segmentation,
and outperforms the current state-of-the-art on both tasks
- …