2,194 research outputs found
Generic 3D Representation via Pose Estimation and Matching
Though a large body of computer vision research has investigated developing
generic semantic representations, efforts towards developing a similar
representation for 3D has been limited. In this paper, we learn a generic 3D
representation through solving a set of foundational proxy 3D tasks:
object-centric camera pose estimation and wide baseline feature matching. Our
method is based upon the premise that by providing supervision over a set of
carefully selected foundational tasks, generalization to novel tasks and
abstraction capabilities can be achieved. We empirically show that the internal
representation of a multi-task ConvNet trained to solve the above core problems
generalizes to novel 3D tasks (e.g., scene layout estimation, object pose
estimation, surface normal estimation) without the need for fine-tuning and
shows traits of abstraction abilities (e.g., cross-modality pose estimation).
In the context of the core supervised tasks, we demonstrate our representation
achieves state-of-the-art wide baseline feature matching results without
requiring apriori rectification (unlike SIFT and the majority of learned
features). We also show 6DOF camera pose estimation given a pair local image
patches. The accuracy of both supervised tasks come comparable to humans.
Finally, we contribute a large-scale dataset composed of object-centric street
view scenes along with point correspondences and camera pose information, and
conclude with a discussion on the learned representation and open research
questions.Comment: Published in ECCV16. See the project website
http://3drepresentation.stanford.edu/ and dataset website
https://github.com/amir32002/3D_Street_Vie
Deep Discrete Hashing with Self-supervised Pairwise Labels
Hashing methods have been widely used for applications of large-scale image
retrieval and classification. Non-deep hashing methods using handcrafted
features have been significantly outperformed by deep hashing methods due to
their better feature representation and end-to-end learning framework. However,
the most striking successes in deep hashing have mostly involved discriminative
models, which require labels. In this paper, we propose a novel unsupervised
deep hashing method, named Deep Discrete Hashing (DDH), for large-scale image
retrieval and classification. In the proposed framework, we address two main
problems: 1) how to directly learn discrete binary codes? 2) how to equip the
binary representation with the ability of accurate image retrieval and
classification in an unsupervised way? We resolve these problems by introducing
an intermediate variable and a loss function steering the learning process,
which is based on the neighborhood structure in the original space.
Experimental results on standard datasets (CIFAR-10, NUS-WIDE, and Oxford-17)
demonstrate that our DDH significantly outperforms existing hashing methods by
large margin in terms of~mAP for image retrieval and object recognition. Code
is available at \url{https://github.com/htconquer/ddh}
- …