8 research outputs found

    Hetero-manifold Regularisation for Cross-modal Hashing

    Get PDF
    Recently, cross-modal search has attracted considerable attention but remains a very challenging task because of the integration complexity and heterogeneity of the multi-modal data. To address both challenges, in this paper, we propose a novel method termed hetero-manifold regularisation (HMR) to supervise the learning of hash functions for efficient cross-modal search. A hetero-manifold integrates multiple sub-manifolds defined by homogeneous data with the help of cross-modal supervision information. Taking advantages of the hetero-manifold, the similarity between each pair of heterogeneous data could be naturally measured by three order random walks on this hetero-manifold. Furthermore, a novel cumulative distance inequality defined on the hetero-manifold is introduced to avoid the computational difficulty induced by the discreteness of hash codes. By using the inequality, cross-modal hashing is transformed into a problem of hetero-manifold regularised support vector learning. Therefore, the performance of cross-modal search can be significantly improved by seamlessly combining the integrated information of the hetero-manifold and the strong generalisation of the support vector machine. Comprehensive experiments show that the proposed HMR achieve advantageous results over the state-of-the-art methods in several challenging cross-modal tasks

    Guest editors' introduction to the special section on learning with Shared information for computer vision and multimedia analysis

    Get PDF
    The twelve papers in this special section focus on learning systems with shared information for computer vision and multimedia communication analysis. In the real world, a realistic setting for computer vision or multimedia recognition problems is that we have some classes containing lots of training data and many classes containing a small amount of training data. Therefore, how to use frequent classes to help learning rare classes for which it is harder to collect the training data is an open question. Learning with shared information is an emerging topic in machine learning, computer vision and multimedia analysis. There are different levels of components that can be shared during concept modeling and machine learning stages, such as sharing generic object parts, sharing attributes, sharing transformations, sharing regularization parameters and sharing training examples, etc. Regarding the specific methods, multi-task learning, transfer learning and deep learning can be seen as using different strategies to share information. These learning with shared information methods are very effective in solving real-world large-scale problems

    Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval

    Full text link
    In this paper, we propose a novel deep generative approach to cross-modal retrieval to learn hash functions in the absence of paired training samples through the cycle consistency loss. Our proposed approach employs adversarial training scheme to lean a couple of hash functions enabling translation between modalities while assuming the underlying semantic relationship. To induce the hash codes with semantics to the input-output pair, cycle consistency loss is further proposed upon the adversarial training to strengthen the correlations between inputs and corresponding outputs. Our approach is generative to learn hash functions such that the learned hash codes can maximally correlate each input-output correspondence, meanwhile can also regenerate the inputs so as to minimize the information loss. The learning to hash embedding is thus performed to jointly optimize the parameters of the hash functions across modalities as well as the associated generative models. Extensive experiments on a variety of large-scale cross-modal data sets demonstrate that our proposed method achieves better retrieval results than the state-of-the-arts.Comment: To appeared on IEEE Trans. Image Processing. arXiv admin note: text overlap with arXiv:1703.10593 by other author

    Visual Data Association: Tracking, Re-identification and Retrieval

    Get PDF
    As there is a rapid development of the information society, large amounts of multimedia data are generated, which are shared and transferred on various electronic devices and the Internet every minute. Hence, building intelligent systems capable of associating these visual data at diverse locations and different times is absolutely essential and will significantly facilitate understanding and identifying where an object came from and where it is going. Thus, the estimated traces of motions or changes increasingly make it feasible to implement advanced algorithms to real-world applications, including human-computer interaction, robotic navigation, security in surveillance, biological characteristics association and civil structure vibration detection. However, due to the inherent challenges, such as ambiguity, heterogeneity, noisy data, large-scale property and unknown variations, visual data association is currently far from being established. Therefore, this thesis focuses on the studies of associating visual data at diverse locations and different times for the tasks of tracking, re-identification and retrieval. More specifically, three situations including single camera, across multiple cameras and across multiple modalities have been investigated and four algorithms have been developed at different levels. Chapter 3 The first algorithm is to explore an ensemble system for robust object tracking, primarily considering the independence of classifier members. An empirical analysis is firstly given to show that object tracking is a non-i.i.d. sampling, under-sample and incomplete-dataset problem. Then, a set of independent classifiers trained sequentially on different small datasets is dynamically maintained to overcome the particular machine learning problem. Thus, for every challenge, an optimal classifier can be approximated in a subspace spanned by the selected competitive classifiers. Chapter 4 The second method is to improve the object tracking by exploiting a winner-take-all strategy to select the most suitable trackers. This topic naturally extends the concept of ensemble in the first topic to a more general idea: a multi-expert system, in which members come from different function spaces. Thus, the diversity of the system is more likely to be amplified. Based on a large public dataset, a prediction model of performance for different trackers on various challenges can be obtained off-line. Then, the learned structural regression model can be directly used to efficiently select the winner tracker online. Chapter 5 The third one is to learn cross-view identities for fast person re-identification, in a cross-camera setting, which significantly differs from the single-view object tracking in the first two topics. Two sets of discriminative hash functions for two different views are learned by simultaneously minimising their distance in the Hamming space, and maximising the cross-covariance and margin. Thus, similar binary codes can be found for images of the same person captured at different views by embedding the images into the Hamming space. Chapter 6 The fourth model is to develop a novel Hetero-manifold regularisation framework for efficient cross-modal retrieval. Compared with the first two settings, this is a more general and complex topic, in which the samples can be relaxed to the images captured in the very far distance or very long time, even to text, voice and other formats. Taking advantage of the hetero-manifold, the similarity between each pair of heterogeneous data could be naturally measured by three order random walks on this hetero-manifold. It is concluded that, by fully exploiting the algorithms for solving the problems in the three situations, an integrated trace for an object moving anywhere can be definitely discovered
    corecore