6,001 research outputs found
WarpNet: Weakly Supervised Matching for Single-view Reconstruction
We present an approach to matching images of objects in fine-grained datasets
without using part annotations, with an application to the challenging problem
of weakly supervised single-view reconstruction. This is in contrast to prior
works that require part annotations, since matching objects across class and
pose variations is challenging with appearance features alone. We overcome this
challenge through a novel deep learning architecture, WarpNet, that aligns an
object in one image with a different object in another. We exploit the
structure of the fine-grained dataset to create artificial data for training
this network in an unsupervised-discriminative learning approach. The output of
the network acts as a spatial prior that allows generalization at test time to
match real images across variations in appearance, viewpoint and articulation.
On the CUB-200-2011 dataset of bird categories, we improve the AP over an
appearance-only network by 13.6%. We further demonstrate that our WarpNet
matches, together with the structure of fine-grained datasets, allow
single-view reconstructions with quality comparable to using annotated point
correspondences.Comment: to appear in IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 201
Unsupervised Learning of Complex Articulated Kinematic Structures combining Motion and Skeleton Information
In this paper we present a novel framework for unsupervised kinematic structure learning of complex articulated objects from a single-view image sequence. In contrast to prior motion information based methods, which estimate relatively simple articulations, our method can generate arbitrarily complex kinematic structures with skeletal topology by a successive iterative merge process. The iterative merge process is guided by a skeleton distance function which is generated from a novel object boundary generation method from sparse points. Our main contributions can be summarised as follows: (i) Unsupervised complex articulated kinematic structure learning by combining motion and skeleton information. (ii) Iterative fine-to-coarse merging strategy for adaptive motion segmentation and structure smoothing. (iii) Skeleton estimation from sparse feature points. (iv) A new highly articulated object dataset containing multi-stage complexity with ground truth. Our experiments show that the proposed method out-performs state-of-the-art methods both quantitatively and qualitatively
Memory Based Online Learning of Deep Representations from Video Streams
We present a novel online unsupervised method for face identity learning from
video streams. The method exploits deep face descriptors together with a memory
based learning mechanism that takes advantage of the temporal coherence of
visual data. Specifically, we introduce a discriminative feature matching
solution based on Reverse Nearest Neighbour and a feature forgetting strategy
that detect redundant features and discard them appropriately while time
progresses. It is shown that the proposed learning procedure is asymptotically
stable and can be effectively used in relevant applications like multiple face
identification and tracking from unconstrained video streams. Experimental
results show that the proposed method achieves comparable results in the task
of multiple face tracking and better performance in face identification with
offline approaches exploiting future information. Code will be publicly
available.Comment: arXiv admin note: text overlap with arXiv:1708.0361
- …