7 research outputs found
Coupled CycleGAN: Unsupervised Hashing Network for Cross-Modal Retrieval
In recent years, hashing has attracted more and more attention owing to its
superior capacity of low storage cost and high query efficiency in large-scale
cross-modal retrieval. Benefiting from deep leaning, continuously compelling
results in cross-modal retrieval community have been achieved. However,
existing deep cross-modal hashing methods either rely on amounts of labeled
information or have no ability to learn an accuracy correlation between
different modalities. In this paper, we proposed Unsupervised coupled Cycle
generative adversarial Hashing networks (UCH), for cross-modal retrieval, where
outer-cycle network is used to learn powerful common representation, and
inner-cycle network is explained to generate reliable hash codes. Specifically,
our proposed UCH seamlessly couples these two networks with generative
adversarial mechanism, which can be optimized simultaneously to learn
representation and hash codes. Extensive experiments on three popular benchmark
datasets show that the proposed UCH outperforms the state-of-the-art
unsupervised cross-modal hashing methods
SnapMix: Semantically Proportional Mixing for Augmenting Fine-grained Data
Data mixing augmentation has proved effective in training deep models. Recent
methods mix labels mainly based on the mixture proportion of image pixels. As
the main discriminative information of a fine-grained image usually resides in
subtle regions, methods along this line are prone to heavy label noise in
fine-grained recognition. We propose in this paper a novel scheme, termed as
Semantically Proportional Mixing (SnapMix), which exploits class activation map
(CAM) to lessen the label noise in augmenting fine-grained data. SnapMix
generates the target label for a mixed image by estimating its intrinsic
semantic composition, and allows for asymmetric mixing operations and ensures
semantic correspondence between synthetic images and target labels. Experiments
show that our method consistently outperforms existing mixed-based approaches
on various datasets and under different network depths. Furthermore, by
incorporating the mid-level features, the proposed SnapMix achieves top-level
performance, demonstrating its potential to serve as a solid baseline for
fine-grained recognition. Our code is available at
https://github.com/Shaoli-Huang/SnapMix.git.Comment: Accepted by AAAI202
Semantic Adversarial Network with Multi-scale Pyramid Attention for Video Classification
Two-stream architecture have shown strong performance in video classification
task. The key idea is to learn spatio-temporal features by fusing convolutional
networks spatially and temporally. However, there are some problems within such
architecture. First, it relies on optical flow to model temporal information,
which are often expensive to compute and store. Second, it has limited ability
to capture details and local context information for video data. Third, it
lacks explicit semantic guidance that greatly decrease the classification
performance. In this paper, we proposed a new two-stream based deep framework
for video classification to discover spatial and temporal information only from
RGB frames, moreover, the multi-scale pyramid attention (MPA) layer and the
semantic adversarial learning (SAL) module is introduced and integrated in our
framework. The MPA enables the network capturing global and local feature to
generate a comprehensive representation for video, and the SAL can make this
representation gradually approximate to the real video semantics in an
adversarial manner. Experimental results on two public benchmarks demonstrate
our proposed methods achieves state-of-the-art results on standard video
datasets
Shared Predictive Cross-Modal Deep Quantization
© 2012 IEEE. With explosive growth of data volume and ever-increasing diversity of data modalities, cross-modal similarity search, which conducts nearest neighbor search across different modalities, has been attracting increasing interest. This paper presents a deep compact code learning solution for efficient cross-modal similarity search. Many recent studies have proven that quantization-based approaches perform generally better than hashing-based approaches on single-modal similarity search. In this paper, we propose a deep quantization approach, which is among the early attempts of leveraging deep neural networks into quantization-based cross-modal similarity search. Our approach, dubbed shared predictive deep quantization (SPDQ), explicitly formulates a shared subspace across different modalities and two private subspaces for individual modalities, and representations in the shared subspace and the private subspaces are learned simultaneously by embedding them to a reproducing kernel Hilbert space, where the mean embedding of different modality distributions can be explicitly compared. In addition, in the shared subspace, a quantizer is learned to produce the semantics preserving compact codes with the help of label alignment. Thanks to this novel network architecture in cooperation with supervised quantization training, SPDQ can preserve intramodal and intermodal similarities as much as possible and greatly reduce quantization error. Experiments on two popular benchmarks corroborate that our approach outperforms state-of-the-art methods