595 research outputs found

    Deep Binary Representation Learning for Single/Cross-Modal Data Retrieval

    Get PDF
    Data similarity search is widely regarded as a classic topic in the realms of computer vision, machine learning and data mining. Providing a certain query, the retrieval model sorts out the related candidates in the database according to their similarities, where representation learning methods and nearest-neighbour search apply. As matching data features in Hamming space is computationally cheaper than in Euclidean space, learning to hash and binary representations are generally appreciated in modern retrieval models. Recent research seeks solutions in deep learning to formulate the hash functions, showing great potential in retrieval performance. In this thesis, we gradually extend our research topics and contributions from unsupervised single-modal deep hashing to supervised cross-modal hashing _nally zero-shot hashing problems, addressing the following challenges in deep hashing. First of all, existing unsupervised deep hashing works are still not attaining leading retrieval performance compared with the shallow ones. To improve this, a novel unsupervised single-modal hashing model is proposed in this thesis, named Deep Variational Binaries (DVB). We introduce the popular conditional variational auto-encoders to formulate the encoding function. By minimizing the reconstruction error of the latent variables, the proposed model produces compact binary codes without training supervision. Experiments on benchmarked datasets show that our model outperform existing unsupervised hashing methods. The second problem is that current cross-modal hashing methods only consider holistic image representations and fail to model descriptive sentences, which is inappropriate to handle the rich semantics of informative cross-modal data for quality textual-visual search tasks. To handle this problem, we propose a supervised deep cross-modal hashing model called Textual-Visual Deep Binaries (TVDB). Region-based neural networks and recurrent neural networks are involved in the image encoding network in order to make e_ective use of visual information, while the text encoder is built using a convolutional neural network. We additionally introduce an e_cient in-batch optimization routine to train the network parameters. The proposed mode successfully outperforms state-of-the-art methods on large-scale datasets. Finally, existing hashing models fail when the categories of query data have never been seen during training. This scenario is further extended into a novel zero-shot cross-modal hashing task in this thesis, and a Zero-shot Sketch-Image Hashing (ZSIH) scheme is then proposed with graph convolution and stochastic neurons. Experiments show that the proposed ZSIH model signi_cantly outperforms existing hashing algorithms in the zero-shot retrieval task. Experiments suggest our proposed and novel hashing methods outperform state-of-the-art researches in single-modal and cross-modal data retrieval

    Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval

    Full text link
    In this paper, we propose a novel deep generative approach to cross-modal retrieval to learn hash functions in the absence of paired training samples through the cycle consistency loss. Our proposed approach employs adversarial training scheme to lean a couple of hash functions enabling translation between modalities while assuming the underlying semantic relationship. To induce the hash codes with semantics to the input-output pair, cycle consistency loss is further proposed upon the adversarial training to strengthen the correlations between inputs and corresponding outputs. Our approach is generative to learn hash functions such that the learned hash codes can maximally correlate each input-output correspondence, meanwhile can also regenerate the inputs so as to minimize the information loss. The learning to hash embedding is thus performed to jointly optimize the parameters of the hash functions across modalities as well as the associated generative models. Extensive experiments on a variety of large-scale cross-modal data sets demonstrate that our proposed method achieves better retrieval results than the state-of-the-arts.Comment: To appeared on IEEE Trans. Image Processing. arXiv admin note: text overlap with arXiv:1703.10593 by other author

    Crossing Generative Adversarial Networks for Cross-View Person Re-identification

    Full text link
    Person re-identification (\textit{re-id}) refers to matching pedestrians across disjoint yet non-overlapping camera views. The most effective way to match these pedestrians undertaking significant visual variations is to seek reliably invariant features that can describe the person of interest faithfully. Most of existing methods are presented in a supervised manner to produce discriminative features by relying on labeled paired images in correspondence. However, annotating pair-wise images is prohibitively expensive in labors, and thus not practical in large-scale networked cameras. Moreover, seeking comparable representations across camera views demands a flexible model to address the complex distributions of images. In this work, we study the co-occurrence statistic patterns between pairs of images, and propose to crossing Generative Adversarial Network (Cross-GAN) for learning a joint distribution for cross-image representations in a unsupervised manner. Given a pair of person images, the proposed model consists of the variational auto-encoder to encode the pair into respective latent variables, a proposed cross-view alignment to reduce the view disparity, and an adversarial layer to seek the joint distribution of latent representations. The learned latent representations are well-aligned to reflect the co-occurrence patterns of paired images. We empirically evaluate the proposed model against challenging datasets, and our results show the importance of joint invariant features in improving matching rates of person re-id with comparison to semi/unsupervised state-of-the-arts.Comment: 12 pages. arXiv admin note: text overlap with arXiv:1702.03431 by other author

    Zero-Shot Sketch-Image Hashing

    Get PDF
    Recent studies show that large-scale sketch-based image retrieval (SBIR) can be efficiently tackled by cross-modal binary representation learning methods, where Hamming distance matching significantly speeds up the process of similarity search. Providing training and test data subjected to a fixed set of pre-defined categories, the cutting-edge SBIR and cross-modal hashing works obtain acceptable retrieval performance. However, most of the existing methods fail when the categories of query sketches have never been seen during training. In this paper, the above problem is briefed as a novel but realistic zero-shot SBIR hashing task. We elaborate the challenges of this special task and accordingly propose a zero-shot sketch-image hashing (ZSIH) model. An end-to-end three-network architecture is built, two of which are treated as the binary encoders. The third network mitigates the sketch-image heterogeneity and enhances the semantic relations among data by utilizing the Kronecker fusion layer and graph convolution, respectively. As an important part of ZSIH, we formulate a generative hashing scheme in reconstructing semantic knowledge representations for zero-shot retrieval. To the best of our knowledge, ZSIH is the first zero-shot hashing work suitable for SBIR and cross-modal search. Comprehensive experiments are conducted on two extended datasets, i.e., Sketchy and TU-Berlin with a novel zero-shot train-test split. The proposed model remarkably outperforms related works.Comment: Accepted as spotlight at CVPR 201
    corecore