5 research outputs found
Deep Heterogeneous Hashing for Face Video Retrieval
Retrieving videos of a particular person with face image as a query via
hashing technique has many important applications. While face images are
typically represented as vectors in Euclidean space, characterizing face videos
with some robust set modeling techniques (e.g. covariance matrices as exploited
in this study, which reside on Riemannian manifold), has recently shown
appealing advantages. This hence results in a thorny heterogeneous spaces
matching problem. Moreover, hashing with handcrafted features as done in many
existing works is clearly inadequate to achieve desirable performance for this
task. To address such problems, we present an end-to-end Deep Heterogeneous
Hashing (DHH) method that integrates three stages including image feature
learning, video modeling, and heterogeneous hashing in a single framework, to
learn unified binary codes for both face images and videos. To tackle the key
challenge of hashing on the manifold, a well-studied Riemannian kernel mapping
is employed to project data (i.e. covariance matrices) into Euclidean space and
thus enables to embed the two heterogeneous representations into a common
Hamming space, where both intra-space discriminability and inter-space
compatibility are considered. To perform network optimization, the gradient of
the kernel mapping is innovatively derived via structured matrix
backpropagation in a theoretically principled way. Experiments on three
challenging datasets show that our method achieves quite competitive
performance compared with existing hashing methods.Comment: 14 pages, 17 figures, 4 tables, accepted by IEEE Transactions on
Image Processing (TIP) 201
Face video retrieval with image query via hashing across Euclidean space and Riemannian manifold
Retrieving videos of a specific person given his/her face image as query becomes more and more appealing for applications like smart movie fast-forwards and suspec-t searching. It also forms an interesting but challenging computer vision task, as the visual data to match, i.e., still image and video clip are usually represented quite different-ly. Typically, face image is represented as point (i.e., vector) in Euclidean space, while video clip is seemingly modeled as a point (e.g., covariance matrix) on some particular Rie-mannian manifold in the light of its recent promising suc-cess. It thus incurs a new hashing-based retrieval problem of matching two heterogeneous representations, respective-ly in Euclidean space and Riemannian manifold. This work makes the first attempt to embed the two heterogeneous s-paces into a common discriminant Hamming space. Specifi-cally, we propose Hashing across Euclidean space and Rie-mannian manifold (HER) by deriving a unified framework to firstly embed the two spaces into corresponding repro-ducing kernel Hilbert spaces, and then iteratively optimize the intra- and inter-space Hamming distances in a max-margin framework to learn the hash functions for the two spaces. Extensive experiments demonstrate the impressive superiority of our method over the state-of-the-art competi-tive hash learning methods. 1