95 research outputs found
DomainDrop: Suppressing Domain-Sensitive Channels for Domain Generalization
Deep Neural Networks have exhibited considerable success in various visual
tasks. However, when applied to unseen test datasets, state-of-the-art models
often suffer performance degradation due to domain shifts. In this paper, we
introduce a novel approach for domain generalization from a novel perspective
of enhancing the robustness of channels in feature maps to domain shifts. We
observe that models trained on source domains contain a substantial number of
channels that exhibit unstable activations across different domains, which are
inclined to capture domain-specific features and behave abnormally when exposed
to unseen target domains. To address the issue, we propose a DomainDrop
framework to continuously enhance the channel robustness to domain shifts,
where a domain discriminator is used to identify and drop unstable channels in
feature maps of each network layer during forward propagation. We theoretically
prove that our framework could effectively lower the generalization bound.
Extensive experiments on several benchmarks indicate that our framework
achieves state-of-the-art performance compared to other competing methods. Our
code is available at https://github.com/lingeringlight/DomainDrop.Comment: Accepted by ICCV2023. The code is available at
https://github.com/lingeringlight/DomainDro
A Novel Unsupervised Camera-aware Domain Adaptation Framework for Person Re-identification
Unsupervised cross-domain person re-identification (Re-ID) faces two key
issues. One is the data distribution discrepancy between source and target
domains, and the other is the lack of labelling information in target domain.
They are addressed in this paper from the perspective of representation
learning. For the first issue, we highlight the presence of camera-level
sub-domains as a unique characteristic of person Re-ID, and develop
camera-aware domain adaptation to reduce the discrepancy not only between
source and target domains but also across these sub-domains. For the second
issue, we exploit the temporal continuity in each camera of target domain to
create discriminative information. This is implemented by dynamically
generating online triplets within each batch, in order to maximally take
advantage of the steadily improved feature representation in training process.
Together, the above two methods give rise to a novel unsupervised deep domain
adaptation framework for person Re-ID. Experiments and ablation studies on
benchmark datasets demonstrate its superiority and interesting properties.Comment: Accepted by ICCV201
Online Deep Metric Learning
Metric learning learns a metric function from training data to calculate the
similarity or distance between samples. From the perspective of feature
learning, metric learning essentially learns a new feature space by feature
transformation (e.g., Mahalanobis distance metric). However, traditional metric
learning algorithms are shallow, which just learn one metric space (feature
transformation). Can we further learn a better metric space from the learnt
metric space? In other words, can we learn metric progressively and nonlinearly
like deep learning by just using the existing metric learning algorithms? To
this end, we present a hierarchical metric learning scheme and implement an
online deep metric learning framework, namely ODML. Specifically, we take one
online metric learning algorithm as a metric layer, followed by a nonlinear
layer (i.e., ReLU), and then stack these layers modelled after the deep
learning. The proposed ODML enjoys some nice properties, indeed can learn
metric progressively and performs superiorly on some datasets. Various
experiments with different settings have been conducted to verify these
properties of the proposed ODML.Comment: 9 page
OPML: A One-Pass Closed-Form Solution for Online Metric Learning
To achieve a low computational cost when performing online metric learning
for large-scale data, we present a one-pass closed-form solution namely OPML in
this paper. Typically, the proposed OPML first adopts a one-pass triplet
construction strategy, which aims to use only a very small number of triplets
to approximate the representation ability of whole original triplets obtained
by batch-manner methods. Then, OPML employs a closed-form solution to update
the metric for new coming samples, which leads to a low space (i.e., )
and time (i.e., ) complexity, where is the feature dimensionality.
In addition, an extension of OPML (namely COPML) is further proposed to enhance
the robustness when in real case the first several samples come from the same
class (i.e., cold start problem). In the experiments, we have systematically
evaluated our methods (OPML and COPML) on three typical tasks, including UCI
data classification, face verification, and abnormal event detection in videos,
which aims to fully evaluate the proposed methods on different sample number,
different feature dimensionalities and different feature extraction ways (i.e.,
hand-crafted and deeply-learned). The results show that OPML and COPML can
obtain the promising performance with a very low computational cost. Also, the
effectiveness of COPML under the cold start setting is experimentally verified.Comment: 12 page
A Novel Cross-Perturbation for Single Domain Generalization
Single domain generalization aims to enhance the ability of the model to
generalize to unknown domains when trained on a single source domain. However,
the limited diversity in the training data hampers the learning of
domain-invariant features, resulting in compromised generalization performance.
To address this, data perturbation (augmentation) has emerged as a crucial
method to increase data diversity. Nevertheless, existing perturbation methods
often focus on either image-level or feature-level perturbations independently,
neglecting their synergistic effects. To overcome these limitations, we propose
CPerb, a simple yet effective cross-perturbation method. Specifically, CPerb
utilizes both horizontal and vertical operations. Horizontally, it applies
image-level and feature-level perturbations to enhance the diversity of the
training data, mitigating the issue of limited diversity in single-source
domains. Vertically, it introduces multi-route perturbation to learn
domain-invariant features from different perspectives of samples with the same
semantic category, thereby enhancing the generalization capability of the
model. Additionally, we propose MixPatch, a novel feature-level perturbation
method that exploits local image style information to further diversify the
training data. Extensive experiments on various benchmark datasets validate the
effectiveness of our method
ALOFT: A Lightweight MLP-like Architecture with Dynamic Low-frequency Transform for Domain Generalization
Domain generalization (DG) aims to learn a model that generalizes well to
unseen target domains utilizing multiple source domains without re-training.
Most existing DG works are based on convolutional neural networks (CNNs).
However, the local operation of the convolution kernel makes the model focus
too much on local representations (e.g., texture), which inherently causes the
model more prone to overfit to the source domains and hampers its
generalization ability. Recently, several MLP-based methods have achieved
promising results in supervised learning tasks by learning global interactions
among different patches of the image. Inspired by this, in this paper, we first
analyze the difference between CNN and MLP methods in DG and find that MLP
methods exhibit a better generalization ability because they can better capture
the global representations (e.g., structure) than CNN methods. Then, based on a
recent lightweight MLP method, we obtain a strong baseline that outperforms
most state-of-the-art CNN-based methods. The baseline can learn global
structure representations with a filter to suppress structure irrelevant
information in the frequency space. Moreover, we propose a dynAmic
LOw-Frequency spectrum Transform (ALOFT) that can perturb local texture
features while preserving global structure features, thus enabling the filter
to remove structure-irrelevant information sufficiently. Extensive experiments
on four benchmarks have demonstrated that our method can achieve great
performance improvement with a small number of parameters compared to SOTA
CNN-based DG methods. Our code is available at
https://github.com/lingeringlight/ALOFT/.Comment: Accepted by CVPR2023. The code is available at
https://github.com/lingeringlight/ALOFT
DomainAdaptor: A Novel Approach to Test-time Adaptation
To deal with the domain shift between training and test samples, current
methods have primarily focused on learning generalizable features during
training and ignore the specificity of unseen samples that are also critical
during the test. In this paper, we investigate a more challenging task that
aims to adapt a trained CNN model to unseen domains during the test. To
maximumly mine the information in the test data, we propose a unified method
called DomainAdaptor for the test-time adaptation, which consists of an
AdaMixBN module and a Generalized Entropy Minimization (GEM) loss.
Specifically, AdaMixBN addresses the domain shift by adaptively fusing training
and test statistics in the normalization layer via a dynamic mixture
coefficient and a statistic transformation operation. To further enhance the
adaptation ability of AdaMixBN, we design a GEM loss that extends the Entropy
Minimization loss to better exploit the information in the test data. Extensive
experiments show that DomainAdaptor consistently outperforms the
state-of-the-art methods on four benchmarks. Furthermore, our method brings
more remarkable improvement against existing methods on the few-data unseen
domain. The code is available at https://github.com/koncle/DomainAdaptor.Comment: Accepted by ICCV202
- …