1,009 research outputs found

    Zero-Shot Sketch-Image Hashing

    Get PDF
    Recent studies show that large-scale sketch-based image retrieval (SBIR) can be efficiently tackled by cross-modal binary representation learning methods, where Hamming distance matching significantly speeds up the process of similarity search. Providing training and test data subjected to a fixed set of pre-defined categories, the cutting-edge SBIR and cross-modal hashing works obtain acceptable retrieval performance. However, most of the existing methods fail when the categories of query sketches have never been seen during training. In this paper, the above problem is briefed as a novel but realistic zero-shot SBIR hashing task. We elaborate the challenges of this special task and accordingly propose a zero-shot sketch-image hashing (ZSIH) model. An end-to-end three-network architecture is built, two of which are treated as the binary encoders. The third network mitigates the sketch-image heterogeneity and enhances the semantic relations among data by utilizing the Kronecker fusion layer and graph convolution, respectively. As an important part of ZSIH, we formulate a generative hashing scheme in reconstructing semantic knowledge representations for zero-shot retrieval. To the best of our knowledge, ZSIH is the first zero-shot hashing work suitable for SBIR and cross-modal search. Comprehensive experiments are conducted on two extended datasets, i.e., Sketchy and TU-Berlin with a novel zero-shot train-test split. The proposed model remarkably outperforms related works.Comment: Accepted as spotlight at CVPR 201

    End-to-End Cross-Modality Retrieval with CCA Projections and Pairwise Ranking Loss

    Full text link
    Cross-modality retrieval encompasses retrieval tasks where the fetched items are of a different type than the search query, e.g., retrieving pictures relevant to a given text query. The state-of-the-art approach to cross-modality retrieval relies on learning a joint embedding space of the two modalities, where items from either modality are retrieved using nearest-neighbor search. In this work, we introduce a neural network layer based on Canonical Correlation Analysis (CCA) that learns better embedding spaces by analytically computing projections that maximize correlation. In contrast to previous approaches, the CCA Layer (CCAL) allows us to combine existing objectives for embedding space learning, such as pairwise ranking losses, with the optimal projections of CCA. We show the effectiveness of our approach for cross-modality retrieval on three different scenarios (text-to-image, audio-sheet-music and zero-shot retrieval), surpassing both Deep CCA and a multi-view network using freely learned projections optimized by a pairwise ranking loss, especially when little training data is available (the code for all three methods is released at: https://github.com/CPJKU/cca_layer).Comment: Preliminary version of a paper published in the International Journal of Multimedia Information Retrieva

    Deep Learning for Free-Hand Sketch: A Survey

    Get PDF
    Free-hand sketches are highly illustrative, and have been widely used by humans to depict objects or stories from ancient times to the present. The recent prevalence of touchscreen devices has made sketch creation a much easier task than ever and consequently made sketch-oriented applications increasingly popular. The progress of deep learning has immensely benefited free-hand sketch research and applications. This paper presents a comprehensive survey of the deep learning techniques oriented at free-hand sketch data, and the applications that they enable. The main contents of this survey include: (i) A discussion of the intrinsic traits and unique challenges of free-hand sketch, to highlight the essential differences between sketch data and other data modalities, e.g., natural photos. (ii) A review of the developments of free-hand sketch research in the deep learning era, by surveying existing datasets, research topics, and the state-of-the-art methods through a detailed taxonomy and experimental evaluation. (iii) Promotion of future work via a discussion of bottlenecks, open problems, and potential research directions for the community.Comment: This paper is accepted by IEEE TPAM

    Delving Deep into the Sketch and Photo Relation

    Get PDF
    "Sketches drawn by humans can play a similar role to photos in terms of conveying shape, posture as well as fine-grained information, and this fact has stimulated one line of cross-domain research that is related to sketch and photo, including sketch-based photo synthesis and retrieval. In this thesis, we aim to further investigate the relationship between sketch and photo. More specifically, we study certain under- explored traits in this relationship, and propose novel applications to reinforce the understanding of sketch and photo relation.Our exploration starts with the problem of sketch-based photo synthesis, where the unique trait of non-rigid alignment between sketch and photo is overlooked in existing research. We then carry on with our investigation from a new angle to study whether sketch can facilitate photo classifier generation. Building upon this, we continue to explore how sketch and photo are linked together on a more fine-grained level by tackling with the sketch-based photo segmenter prediction. Furthermore, we address the data scarcity issue identified in nearly all sketch-photo-related applications by examining their inherent correlation in the semantic aspect using sketch-based image retrieval (SBIR) as a test-bed. In general, we make four main contributions to the research on relationship between sketch and photo.Firstly, to mitigate the effect of deformation in sketch-based photo synthesis, we introduce the spatial transformer network to our image-image regression framework, which subtly deals with non-rigid alignment between the sketches and photos. The qualitative and quantitative experiments consistently reveal the superior quality of our synthesised photos over those generated by existing approaches.Secondly, sketch-based photo classifier generation is achieved with a novel model regression network, which maps the sketch to the parameters of photo classification model. It is shown that our model regression network is able to generalise across categories and photo classifiers for novel classes not involved in training are just a sketch away. Comprehensive experiments illustrate the promising performance of the generated binary and multi-class photo classifiers, and demonstrate that sketches can also be employed to enhance the granularity of existing photo classifiers.Thirdly, to achieve the goal of sketch-based photo segmentation, we propose a photo segmentation model generation algorithm that predicts the weights of a deep photo segmentation network according to the input sketch. The results confirm that one single sketch is the only prerequisite for unseen category photo segmentation, and the segmentation performance can be further improved by utilising sketch that is aligned with the object to be segmented in shape and position.Finally, we present an unsupervised representation learning framework for SBIR, the purpose of which is to eliminate the barrier imposed by data annotation scarcity. Prototype and memory bank reinforced joint distribution optimal transport is integrated into the unsupervised representation learning framework, so that the mapping between the sketches and photos could be automatically detected to learn a semantically meaningful yet domain-agnostic feature space. Extensive experiments and feature visualisation validate the efficacy of our proposed algorithm.

    Cycle-Consistent Deep Generative Hashing for Cross-Modal Retrieval

    Full text link
    In this paper, we propose a novel deep generative approach to cross-modal retrieval to learn hash functions in the absence of paired training samples through the cycle consistency loss. Our proposed approach employs adversarial training scheme to lean a couple of hash functions enabling translation between modalities while assuming the underlying semantic relationship. To induce the hash codes with semantics to the input-output pair, cycle consistency loss is further proposed upon the adversarial training to strengthen the correlations between inputs and corresponding outputs. Our approach is generative to learn hash functions such that the learned hash codes can maximally correlate each input-output correspondence, meanwhile can also regenerate the inputs so as to minimize the information loss. The learning to hash embedding is thus performed to jointly optimize the parameters of the hash functions across modalities as well as the associated generative models. Extensive experiments on a variety of large-scale cross-modal data sets demonstrate that our proposed method achieves better retrieval results than the state-of-the-arts.Comment: To appeared on IEEE Trans. Image Processing. arXiv admin note: text overlap with arXiv:1703.10593 by other author

    ACNet: Approaching-and-Centralizing Network for Zero-Shot Sketch-Based Image Retrieval

    Full text link
    The huge domain gap between sketches and photos and the highly abstract sketch representations pose challenges for sketch-based image retrieval (\underline{SBIR}). The zero-shot sketch-based image retrieval (\underline{ZS-SBIR}) is more generic and practical but poses an even greater challenge because of the additional knowledge gap between the seen and unseen categories. To simultaneously mitigate both gaps, we propose an \textbf{A}pproaching-and-\textbf{C}entralizing \textbf{Net}work (termed "\textbf{ACNet}") to jointly optimize sketch-to-photo synthesis and the image retrieval. The retrieval module guides the synthesis module to generate large amounts of diverse photo-like images which gradually approach the photo domain, and thus better serve the retrieval module than ever to learn domain-agnostic representations and category-agnostic common knowledge for generalizing to unseen categories. These diverse images generated with retrieval guidance can effectively alleviate the overfitting problem troubling concrete category-specific training samples with high gradients. We also discover the use of proxy-based NormSoftmax loss is effective in the zero-shot setting because its centralizing effect can stabilize our joint training and promote the generalization ability to unseen categories. Our approach is simple yet effective, which achieves state-of-the-art performance on two widely used ZS-SBIR datasets and surpasses previous methods by a large margin.Comment: the paper is under consideration at IEEE Transactions on Circuits and Systems for Video Technolog

    Semantically Tied Paired Cycle Consistency for Zero-Shot Sketch-based Image Retrieval

    Get PDF
    This is the author accepted manuscript. The final version is available from IEEE via the DOI in this recordZero-shot sketch-based image retrieval (SBIR) is an emerging task in computer vision, allowing to retrieve natural images relevant to sketch queries that might not been seen in the training phase. Existing works either require aligned sketch-image pairs or inefficient memory fusion layer for mapping the visual information to a semantic space. In this work, we propose a semantically aligned paired cycle-consistent generative (SEM-PCYC) model for zero-shot SBIR, where each branch maps the visual information to a common semantic space via an adversarial training. Each of these branches maintains a cycle consistency that only requires supervision at category levels, and avoids the need of highly-priced aligned sketch-image pairs. A classification criteria on the generators' outputs ensures the visual to semantic space mapping to be discriminating. Furthermore, we propose to combine textual and hierarchical side information via a feature selection auto-encoder that selects discriminating side information within a same end-to-end model. Our results demonstrate a significant boost in zero-shot SBIR performance over the state-of-the-art on the challenging Sketchy and TU-Berlin datasets.European Union Horizon 202
    • …
    corecore