Search CORE

1,589 research outputs found

Pairwise Constraint Propagation on Multi-View Data

Author: Lu Zhiwu
Wang Liwei
Publication venue
Publication date: 18/01/2015
Field of study

This paper presents a graph-based learning approach to pairwise constraint propagation on multi-view data. Although pairwise constraint propagation has been studied extensively, pairwise constraints are usually defined over pairs of data points from a single view, i.e., only intra-view constraint propagation is considered for multi-view tasks. In fact, very little attention has been paid to inter-view constraint propagation, which is more challenging since pairwise constraints are now defined over pairs of data points from different views. In this paper, we propose to decompose the challenging inter-view constraint propagation problem into semi-supervised learning subproblems so that they can be efficiently solved based on graph-based label propagation. To the best of our knowledge, this is the first attempt to give an efficient solution to inter-view constraint propagation from a semi-supervised learning viewpoint. Moreover, since graph-based label propagation has been adopted for basic optimization, we develop two constrained graph construction methods for interview constraint propagation, which only differ in how the intra-view pairwise constraints are exploited. The experimental results in cross-view retrieval have shown the promising performance of our inter-view constraint propagation

arXiv.org e-Print Archive

Triplet-Based Deep Hashing Network for Cross-Modal Retrieval

Author: Chen Zhaojia
Deng Cheng
Gao Xinbo
Liu Xianglong
Tao Dacheng
Publication venue
Publication date: 04/04/2019
Field of study

Given the benefits of its low storage requirements and high retrieval efficiency, hashing has recently received increasing attention. In particular,cross-modal hashing has been widely and successfully used in multimedia similarity search applications. However, almost all existing methods employing cross-modal hashing cannot obtain powerful hash codes due to their ignoring the relative similarity between heterogeneous data that contains richer semantic information, leading to unsatisfactory retrieval performance. In this paper, we propose a triplet-based deep hashing (TDH) network for cross-modal retrieval. First, we utilize the triplet labels, which describes the relative relationships among three instances as supervision in order to capture more general semantic correlations between cross-modal instances. We then establish a loss function from the inter-modal view and the intra-modal view to boost the discriminative abilities of the hash codes. Finally, graph regularization is introduced into our proposed TDH method to preserve the original semantic similarity between hash codes in Hamming space. Experimental results show that our proposed method outperforms several state-of-the-art approaches on two popular cross-modal datasets

arXiv.org e-Print Archive

An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges

Author: Huang Xin
Peng Yuxin
Zhao Yunzhen
Publication venue
Publication date: 01/07/2017
Field of study

Multimedia retrieval plays an indispensable role in big data utilization. Past efforts mainly focused on single-media retrieval. However, the requirements of users are highly flexible, such as retrieving the relevant audio clips with one query of image. So challenges stemming from the "media gap", which means that representations of different media types are inconsistent, have attracted increasing attention. Cross-media retrieval is designed for the scenarios where the queries and retrieval results are of different media types. As a relatively new research topic, its concepts, methodologies and benchmarks are still not clear in the literatures. To address these issues, we review more than 100 references, give an overview including the concepts, methodologies, major challenges and open issues, as well as build up the benchmarks including datasets and experimental results. Researchers can directly adopt the benchmarks to promptly evaluate their proposed methods. This will help them to focus on algorithm design, rather than the time-consuming compared methods and results. It is noted that we have constructed a new dataset XMedia, which is the first publicly available dataset with up to five media types (text, image, video, audio and 3D model). We believe this overview will attract more researchers to focus on cross-media retrieval and be helpful to them.Comment: 14 pages, accepted by IEEE Transactions on Circuits and Systems for Video Technolog

arXiv.org e-Print Archive

Cross-modal Subspace Learning for Fine-grained Sketch-based Image Retrieval

Author: Guo Jun
Huang Yongye
Kleijn W. Bastiaan
Ma Zhanyu
Song Yi-Zhe
Wang Liang
Xiang Tao
Xu Peng
Yin Qiyue
Publication venue
Publication date: 27/05/2017
Field of study

Sketch-based image retrieval (SBIR) is challenging due to the inherent domain-gap between sketch and photo. Compared with pixel-perfect depictions of photos, sketches are iconic renderings of the real world with highly abstract. Therefore, matching sketch and photo directly using low-level visual clues are unsufficient, since a common low-level subspace that traverses semantically across the two modalities is non-trivial to establish. Most existing SBIR studies do not directly tackle this cross-modal problem. This naturally motivates us to explore the effectiveness of cross-modal retrieval methods in SBIR, which have been applied in the image-text matching successfully. In this paper, we introduce and compare a series of state-of-the-art cross-modal subspace learning methods and benchmark them on two recently released fine-grained SBIR datasets. Through thorough examination of the experimental results, we have demonstrated that the subspace learning can effectively model the sketch-photo domain-gap. In addition we draw a few key insights to drive future research.Comment: Accepted by Neurocomputin

arXiv.org e-Print Archive

CCL: Cross-modal Correlation Learning with Multi-grained Fusion by Hierarchical Network

Author: Huang Xin
Peng Yuxin
Qi Jinwei
Yuan Yuxin
Publication venue
Publication date: 08/08/2017
Field of study

Cross-modal retrieval has become a highlighted research topic for retrieval across multimedia data such as image and text. A two-stage learning framework is widely adopted by most existing methods based on Deep Neural Network (DNN): The first learning stage is to generate separate representation for each modality, and the second learning stage is to get the cross-modal common representation. However, the existing methods have three limitations: (1) In the first learning stage, they only model intra-modality correlation, but ignore inter-modality correlation with rich complementary context. (2) In the second learning stage, they only adopt shallow networks with single-loss regularization, but ignore the intrinsic relevance of intra-modality and inter-modality correlation. (3) Only original instances are considered while the complementary fine-grained clues provided by their patches are ignored. For addressing the above problems, this paper proposes a cross-modal correlation learning (CCL) approach with multi-grained fusion by hierarchical network, and the contributions are as follows: (1) In the first learning stage, CCL exploits multi-level association with joint optimization to preserve the complementary context from intra-modality and inter-modality correlation simultaneously. (2) In the second learning stage, a multi-task learning strategy is designed to adaptively balance the intra-modality semantic category constraints and inter-modality pairwise similarity constraints. (3) CCL adopts multi-grained modeling, which fuses the coarse-grained instances and fine-grained patches to make cross-modal correlation more precise. Comparing with 13 state-of-the-art methods on 6 widely-used cross-modal datasets, the experimental results show our CCL approach achieves the best performance.Comment: 16 pages, accepted by IEEE Transactions on Multimedi

arXiv.org e-Print Archive

Unsupervised Multi-modal Hashing for Cross-modal retrieval

Author: Wu Xiao-Jun
Yu Jun
Publication venue
Publication date: 27/09/2020
Field of study

With the advantage of low storage cost and high efficiency, hashing learning has received much attention in the domain of Big Data. In this paper, we propose a novel unsupervised hashing learning method to cope with this open problem to directly preserve the manifold structure by hashing. To address this problem, both the semantic correlation in textual space and the locally geometric structure in the visual space are explored simultaneously in our framework. Besides, the `2;1-norm constraint is imposed on the projection matrices to learn the discriminative hash function for each modality. Extensive experiments are performed to evaluate the proposed method on the three publicly available datasets and the experimental results show that our method can achieve superior performance over the state-of-the-art methods.Comment: 4 pages, 4 figure

arXiv.org e-Print Archive

CM-GANs: Cross-modal Generative Adversarial Networks for Common Representation Learning

Author: Peng Yuxin
Qi Jinwei
Yuan Yuxin
Publication venue
Publication date: 26/04/2018
Field of study

It is known that the inconsistent distribution and representation of different modalities, such as image and text, cause the heterogeneity gap that makes it challenging to correlate such heterogeneous data. Generative adversarial networks (GANs) have shown its strong ability of modeling data distribution and learning discriminative representation, existing GANs-based works mainly focus on generative problem to generate new data. We have different goal, aim to correlate heterogeneous data, by utilizing the power of GANs to model cross-modal joint distribution. Thus, we propose Cross-modal GANs to learn discriminative common representation for bridging heterogeneity gap. The main contributions are: (1) Cross-modal GANs architecture is proposed to model joint distribution over data of different modalities. The inter-modality and intra-modality correlation can be explored simultaneously in generative and discriminative models. Both of them beat each other to promote cross-modal correlation learning. (2) Cross-modal convolutional autoencoders with weight-sharing constraint are proposed to form generative model. They can not only exploit cross-modal correlation for learning common representation, but also preserve reconstruction information for capturing semantic consistency within each modality. (3) Cross-modal adversarial mechanism is proposed, which utilizes two kinds of discriminative models to simultaneously conduct intra-modality and inter-modality discrimination. They can mutually boost to make common representation more discriminative by adversarial training process. To the best of our knowledge, our proposed CM-GANs approach is the first to utilize GANs to perform cross-modal common representation learning. Experiments are conducted to verify the performance of our proposed approach on cross-modal retrieval paradigm, compared with 10 methods on 3 cross-modal datasets

arXiv.org e-Print Archive

A Comprehensive Survey on Cross-modal Retrieval

Author: Wang Kaiye
Wang Liang
Wang Wei
Wu Shu
Yin Qiyue
Publication venue
Publication date: 21/07/2016
Field of study

In recent years, cross-modal retrieval has drawn much attention due to the rapid growth of multimodal data. It takes one type of data as the query to retrieve relevant data of another type. For example, a user can use a text to retrieve relevant pictures or videos. Since the query and its retrieved results can be of different modalities, how to measure the content similarity between different modalities of data remains a challenge. Various methods have been proposed to deal with such a problem. In this paper, we first review a number of representative methods for cross-modal retrieval and classify them into two main groups: 1) real-valued representation learning, and 2) binary representation learning. Real-valued representation learning methods aim to learn real-valued common representations for different modalities of data. To speed up the cross-modal retrieval, a number of binary representation learning methods are proposed to map different modalities of data into a common Hamming space. Then, we introduce several multimodal datasets in the community, and show the experimental results on two commonly used multimodal datasets. The comparison reveals the characteristic of different kinds of cross-modal retrieval methods, which is expected to benefit both practical applications and future research. Finally, we discuss open problems and future research directions.Comment: 20 pages, 11 figures, 9 table

arXiv.org e-Print Archive

Discriminative Supervised Hashing for Cross-Modal similarity Search

Author: Kittler Josef
Wu Xiao-Jun
Yu Jun
Publication venue
Publication date: 17/04/2019
Field of study

With the advantage of low storage cost and high retrieval efficiency, hashing techniques have recently been an emerging topic in cross-modal similarity search. As multiple modal data reflect similar semantic content, many researches aim at learning unified binary codes. However, discriminative hashing features learned by these methods are not adequate. This results in lower accuracy and robustness. We propose a novel hashing learning framework which jointly performs classifier learning, subspace learning and matrix factorization to preserve class-specific semantic content, termed Discriminative Supervised Hashing (DSH), to learn the discrimative unified binary codes for multi-modal data. Besides, reducing the loss of information and preserving the non-linear structure of data, DSH non-linearly projects different modalities into the common space in which the similarity among heterogeneous data points can be measured. Extensive experiments conducted on the three publicly available datasets demonstrate that the framework proposed in this paper outperforms several state-of -the-art methods.Comment: 7 pages,3 figures,4 tables;The paper is under consideration at Image and Vision Computin

arXiv.org e-Print Archive

Unsupervised Cross-Media Hashing with Structure Preservation

Author: Chia Alex Yong-Sang
Wang Xiangyu
Publication venue
Publication date: 18/03/2016
Field of study

Recent years have seen the exponential growth of heterogeneous multimedia data. The need for effective and accurate data retrieval from heterogeneous data sources has attracted much research interest in cross-media retrieval. Here, given a query of any media type, cross-media retrieval seeks to find relevant results of different media types from heterogeneous data sources. To facilitate large-scale cross-media retrieval, we propose a novel unsupervised cross-media hashing method. Our method incorporates local affinity and distance repulsion constraints into a matrix factorization framework. Correspondingly, the proposed method learns hash functions that generates unified hash codes from different media types, while ensuring intrinsic geometric structure of the data distribution is preserved. These hash codes empower the similarity between data of different media types to be evaluated directly. Experimental results on two large-scale multimedia datasets demonstrate the effectiveness of the proposed method, where we outperform the state-of-the-art methods

arXiv.org e-Print Archive