114 research outputs found
Rethinking Implicit Neural Representations for Vision Learners
Implicit Neural Representations (INRs) are powerful to parameterize
continuous signals in computer vision. However, almost all INRs methods are
limited to low-level tasks, e.g., image/video compression, super-resolution,
and image generation. The questions on how to explore INRs to high-level tasks
and deep networks are still under-explored. Existing INRs methods suffer from
two problems: 1) narrow theoretical definitions of INRs are inapplicable to
high-level tasks; 2) lack of representation capabilities to deep networks.
Motivated by the above facts, we reformulate the definitions of INRs from a
novel perspective and propose an innovative Implicit Neural Representation
Network (INRN), which is the first study of INRs to tackle both low-level and
high-level tasks. Specifically, we present three key designs for basic blocks
in INRN along with two different stacking ways and corresponding loss
functions. Extensive experiments with analysis on both low-level tasks (image
fitting) and high-level vision tasks (image classification, object detection,
instance segmentation) demonstrate the effectiveness of the proposed method
Diverse Target and Contribution Scheduling for Domain Generalization
Generalization under the distribution shift has been a great challenge in
computer vision. The prevailing practice of directly employing the one-hot
labels as the training targets in domain generalization~(DG) can lead to
gradient conflicts, making it insufficient for capturing the intrinsic class
characteristics and hard to increase the intra-class variation. Besides,
existing methods in DG mostly overlook the distinct contributions of source
(seen) domains, resulting in uneven learning from these domains. To address
these issues, we firstly present a theoretical and empirical analysis of the
existence of gradient conflicts in DG, unveiling the previously unexplored
relationship between distribution shifts and gradient conflicts during the
optimization process. In this paper, we present a novel perspective of DG from
the empirical source domain's risk and propose a new paradigm for DG called
Diverse Target and Contribution Scheduling (DTCS). DTCS comprises two
innovative modules: Diverse Target Supervision (DTS) and Diverse Contribution
Balance (DCB), with the aim of addressing the limitations associated with the
common utilization of one-hot labels and equal contributions for source domains
in DG. In specific, DTS employs distinct soft labels as training targets to
account for various feature distributions across domains and thereby mitigates
the gradient conflicts, and DCB dynamically balances the contributions of
source domains by ensuring a fair decline in losses of different source
domains. Extensive experiments with analysis on four benchmark datasets show
that the proposed method achieves a competitive performance in comparison with
the state-of-the-art approaches, demonstrating the effectiveness and advantages
of the proposed DTCS
Rethinking Domain Generalization: Discriminability and Generalizability
Domain generalization (DG) endeavors to develop robust models that possess
strong generalizability while preserving excellent discriminability.
Nonetheless, pivotal DG techniques tend to improve the feature generalizability
by learning domain-invariant representations, inadvertently overlooking the
feature discriminability. On the one hand, the simultaneous attainment of
generalizability and discriminability of features presents a complex challenge,
often entailing inherent contradictions. This challenge becomes particularly
pronounced when domain-invariant features manifest reduced discriminability
owing to the inclusion of unstable factors, \emph{i.e.,} spurious correlations.
On the other hand, prevailing domain-invariant methods can be categorized as
category-level alignment, susceptible to discarding indispensable features
possessing substantial generalizability and narrowing intra-class variations.
To surmount these obstacles, we rethink DG from a new perspective that
concurrently imbues features with formidable discriminability and robust
generalizability, and present a novel framework, namely, Discriminative
Microscopic Distribution Alignment (DMDA). DMDA incorporates two core
components: Selective Channel Pruning~(SCP) and Micro-level Distribution
Alignment (MDA). Concretely, SCP attempts to curtail redundancy within neural
networks, prioritizing stable attributes conducive to accurate classification.
This approach alleviates the adverse effect of spurious domain invariance and
amplifies the feature discriminability. Besides, MDA accentuates micro-level
alignment within each class, going beyond mere category-level alignment. This
strategy accommodates sufficient generalizable features and facilitates
within-class variations. Extensive experiments on four benchmark datasets
corroborate the efficacy of our method
Continuous Piecewise-Affine Based Motion Model for Image Animation
Image animation aims to bring static images to life according to driving
videos and create engaging visual content that can be used for various purposes
such as animation, entertainment, and education. Recent unsupervised methods
utilize affine and thin-plate spline transformations based on keypoints to
transfer the motion in driving frames to the source image. However, limited by
the expressive power of the transformations used, these methods always produce
poor results when the gap between the motion in the driving frame and the
source image is large. To address this issue, we propose to model motion from
the source image to the driving frame in highly-expressive diffeomorphism
spaces. Firstly, we introduce Continuous Piecewise-Affine based (CPAB)
transformation to model the motion and present a well-designed inference
algorithm to generate CPAB transformation from control keypoints. Secondly, we
propose a SAM-guided keypoint semantic loss to further constrain the keypoint
extraction process and improve the semantic consistency between the
corresponding keypoints on the source and driving images. Finally, we design a
structure alignment loss to align the structure-related features extracted from
driving and generated images, thus helping the generator generate results that
are more consistent with the driving action. Extensive experiments on four
datasets demonstrate the effectiveness of our method against state-of-the-art
competitors quantitatively and qualitatively. Code will be publicly available
at: https://github.com/DevilPG/AAAI2024-CPABMM
Uncertainty-Aware Consistency Regularization for Cross-Domain Semantic Segmentation
Unsupervised domain adaptation (UDA) aims to adapt existing models of the
source domain to a new target domain with only unlabeled data. Many
adversarial-based UDA methods involve high-instability training and have to
carefully tune the optimization procedure. Some non-adversarial UDA methods
employ a consistency regularization on the target predictions of a student
model and a teacher model under different perturbations, where the teacher
shares the same architecture with the student and is updated by the exponential
moving average of the student. However, these methods suffer from noticeable
negative transfer resulting from either the error-prone discriminator network
or the unreasonable teacher model. In this paper, we propose an
uncertainty-aware consistency regularization method for cross-domain semantic
segmentation. By exploiting the latent uncertainty information of the target
samples, more meaningful and reliable knowledge from the teacher model can be
transferred to the student model. In addition, we further reveal the reason why
the current consistency regularization is often unstable in minimizing the
distribution discrepancy. We also show that our method can effectively ease
this issue by mining the most reliable and meaningful samples with a dynamic
weighting scheme of consistency loss. Experiments demonstrate that the proposed
method outperforms the state-of-the-art methods on two domain adaptation
benchmarks, GTAV Cityscapes and SYNTHIA
Cityscapes
Instance-Aware Domain Generalization for Face Anti-Spoofing
Face anti-spoofing (FAS) based on domain generalization (DG) has been
recently studied to improve the generalization on unseen scenarios. Previous
methods typically rely on domain labels to align the distribution of each
domain for learning domain-invariant representations. However, artificial
domain labels are coarse-grained and subjective, which cannot reflect real
domain distributions accurately. Besides, such domain-aware methods focus on
domain-level alignment, which is not fine-grained enough to ensure that learned
representations are insensitive to domain styles. To address these issues, we
propose a novel perspective for DG FAS that aligns features on the instance
level without the need for domain labels. Specifically, Instance-Aware Domain
Generalization framework is proposed to learn the generalizable feature by
weakening the features' sensitivity to instance-specific styles. Concretely, we
propose Asymmetric Instance Adaptive Whitening to adaptively eliminate the
style-sensitive feature correlation, boosting the generalization. Moreover,
Dynamic Kernel Generator and Categorical Style Assembly are proposed to first
extract the instance-specific features and then generate the style-diversified
features with large style shifts, respectively, further facilitating the
learning of style-insensitive features. Extensive experiments and analysis
demonstrate the superiority of our method over state-of-the-art competitors.
Code will be publicly available at https://github.com/qianyuzqy/IADG.Comment: Accepted to IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 202
DMT: Dynamic Mutual Training for Semi-Supervised Learning
Recent semi-supervised learning methods use pseudo supervision as core idea,
especially self-training methods that generate pseudo labels. However, pseudo
labels are unreliable. Self-training methods usually rely on single model
prediction confidence to filter low-confidence pseudo labels, thus remaining
high-confidence errors and wasting many low-confidence correct labels. In this
paper, we point out it is difficult for a model to counter its own errors.
Instead, leveraging inter-model disagreement between different models is a key
to locate pseudo label errors. With this new viewpoint, we propose mutual
training between two different models by a dynamically re-weighted loss
function, called Dynamic Mutual Training (DMT). We quantify inter-model
disagreement by comparing predictions from two different models to dynamically
re-weight loss in training, where a larger disagreement indicates a possible
error and corresponds to a lower loss value. Extensive experiments show that
DMT achieves state-of-the-art performance in both image classification and
semantic segmentation. Our codes are released at
https://github.com/voldemortX/DST-CBC .Comment: Reformatte
Context-Aware Mixup for Domain Adaptive Semantic Segmentation
Unsupervised domain adaptation (UDA) aims to adapt a model of the labeled
source domain to an unlabeled target domain. Existing UDA-based semantic
segmentation approaches always reduce the domain shifts in pixel level, feature
level, and output level. However, almost all of them largely neglect the
contextual dependency, which is generally shared across different domains,
leading to less-desired performance. In this paper, we propose a novel
Context-Aware Mixup (CAMix) framework for domain adaptive semantic
segmentation, which exploits this important clue of context-dependency as
explicit prior knowledge in a fully end-to-end trainable manner for enhancing
the adaptability toward the target domain. Firstly, we present a contextual
mask generation strategy by leveraging the accumulated spatial distributions
and prior contextual relationships. The generated contextual mask is critical
in this work and will guide the context-aware domain mixup on three different
levels. Besides, provided the context knowledge, we introduce a
significance-reweighted consistency loss to penalize the inconsistency between
the mixed student prediction and the mixed teacher prediction, which alleviates
the negative transfer of the adaptation, e.g., early performance degradation.
Extensive experiments and analysis demonstrate the effectiveness of our method
against the state-of-the-art approaches on widely-used UDA benchmarks.Comment: Accepted to IEEE Transactions on Circuits and Systems for Video
Technology (TCSVT
- …