4,862 research outputs found
Deep Exemplar 2D-3D Detection by Adapting from Real to Rendered Views
This paper presents an end-to-end convolutional neural network (CNN) for
2D-3D exemplar detection. We demonstrate that the ability to adapt the features
of natural images to better align with those of CAD rendered views is critical
to the success of our technique. We show that the adaptation can be learned by
compositing rendered views of textured object models on natural images. Our
approach can be naturally incorporated into a CNN detection pipeline and
extends the accuracy and speed benefits from recent advances in deep learning
to 2D-3D exemplar detection. We applied our method to two tasks: instance
detection, where we evaluated on the IKEA dataset, and object category
detection, where we out-perform Aubry et al. for "chair" detection on a subset
of the Pascal VOC dataset.Comment: To appear in CVPR 201
GhostVLAD for set-based face recognition
The objective of this paper is to learn a compact representation of image
sets for template-based face recognition. We make the following contributions:
first, we propose a network architecture which aggregates and embeds the face
descriptors produced by deep convolutional neural networks into a compact
fixed-length representation. This compact representation requires minimal
memory storage and enables efficient similarity computation. Second, we propose
a novel GhostVLAD layer that includes {\em ghost clusters}, that do not
contribute to the aggregation. We show that a quality weighting on the input
faces emerges automatically such that informative images contribute more than
those with low quality, and that the ghost clusters enhance the network's
ability to deal with poor quality images. Third, we explore how input feature
dimension, number of clusters and different training techniques affect the
recognition performance. Given this analysis, we train a network that far
exceeds the state-of-the-art on the IJB-B face recognition dataset. This is
currently one of the most challenging public benchmarks, and we surpass the
state-of-the-art on both the identification and verification protocols.Comment: Accepted by ACCV 201
Unsupervised Object Discovery and Tracking in Video Collections
This paper addresses the problem of automatically localizing dominant objects
as spatio-temporal tubes in a noisy collection of videos with minimal or even
no supervision. We formulate the problem as a combination of two complementary
processes: discovery and tracking. The first one establishes correspondences
between prominent regions across videos, and the second one associates
successive similar object regions within the same video. Interestingly, our
algorithm also discovers the implicit topology of frames associated with
instances of the same object class across different videos, a role normally
left to supervisory information in the form of class labels in conventional
image and video understanding methods. Indeed, as demonstrated by our
experiments, our method can handle video collections featuring multiple object
classes, and substantially outperforms the state of the art in colocalization,
even though it tackles a broader problem with much less supervision
- …