114 research outputs found

    Rethinking Implicit Neural Representations for Vision Learners

    Full text link
    Implicit Neural Representations (INRs) are powerful to parameterize continuous signals in computer vision. However, almost all INRs methods are limited to low-level tasks, e.g., image/video compression, super-resolution, and image generation. The questions on how to explore INRs to high-level tasks and deep networks are still under-explored. Existing INRs methods suffer from two problems: 1) narrow theoretical definitions of INRs are inapplicable to high-level tasks; 2) lack of representation capabilities to deep networks. Motivated by the above facts, we reformulate the definitions of INRs from a novel perspective and propose an innovative Implicit Neural Representation Network (INRN), which is the first study of INRs to tackle both low-level and high-level tasks. Specifically, we present three key designs for basic blocks in INRN along with two different stacking ways and corresponding loss functions. Extensive experiments with analysis on both low-level tasks (image fitting) and high-level vision tasks (image classification, object detection, instance segmentation) demonstrate the effectiveness of the proposed method

    Diverse Target and Contribution Scheduling for Domain Generalization

    Full text link
    Generalization under the distribution shift has been a great challenge in computer vision. The prevailing practice of directly employing the one-hot labels as the training targets in domain generalization~(DG) can lead to gradient conflicts, making it insufficient for capturing the intrinsic class characteristics and hard to increase the intra-class variation. Besides, existing methods in DG mostly overlook the distinct contributions of source (seen) domains, resulting in uneven learning from these domains. To address these issues, we firstly present a theoretical and empirical analysis of the existence of gradient conflicts in DG, unveiling the previously unexplored relationship between distribution shifts and gradient conflicts during the optimization process. In this paper, we present a novel perspective of DG from the empirical source domain's risk and propose a new paradigm for DG called Diverse Target and Contribution Scheduling (DTCS). DTCS comprises two innovative modules: Diverse Target Supervision (DTS) and Diverse Contribution Balance (DCB), with the aim of addressing the limitations associated with the common utilization of one-hot labels and equal contributions for source domains in DG. In specific, DTS employs distinct soft labels as training targets to account for various feature distributions across domains and thereby mitigates the gradient conflicts, and DCB dynamically balances the contributions of source domains by ensuring a fair decline in losses of different source domains. Extensive experiments with analysis on four benchmark datasets show that the proposed method achieves a competitive performance in comparison with the state-of-the-art approaches, demonstrating the effectiveness and advantages of the proposed DTCS

    Rethinking Domain Generalization: Discriminability and Generalizability

    Full text link
    Domain generalization (DG) endeavors to develop robust models that possess strong generalizability while preserving excellent discriminability. Nonetheless, pivotal DG techniques tend to improve the feature generalizability by learning domain-invariant representations, inadvertently overlooking the feature discriminability. On the one hand, the simultaneous attainment of generalizability and discriminability of features presents a complex challenge, often entailing inherent contradictions. This challenge becomes particularly pronounced when domain-invariant features manifest reduced discriminability owing to the inclusion of unstable factors, \emph{i.e.,} spurious correlations. On the other hand, prevailing domain-invariant methods can be categorized as category-level alignment, susceptible to discarding indispensable features possessing substantial generalizability and narrowing intra-class variations. To surmount these obstacles, we rethink DG from a new perspective that concurrently imbues features with formidable discriminability and robust generalizability, and present a novel framework, namely, Discriminative Microscopic Distribution Alignment (DMDA). DMDA incorporates two core components: Selective Channel Pruning~(SCP) and Micro-level Distribution Alignment (MDA). Concretely, SCP attempts to curtail redundancy within neural networks, prioritizing stable attributes conducive to accurate classification. This approach alleviates the adverse effect of spurious domain invariance and amplifies the feature discriminability. Besides, MDA accentuates micro-level alignment within each class, going beyond mere category-level alignment. This strategy accommodates sufficient generalizable features and facilitates within-class variations. Extensive experiments on four benchmark datasets corroborate the efficacy of our method

    正当防衛の限界 -日中比較法を中心に-

    Get PDF
    早大学位記番号:新9314博士(法学)早稲田大

    Continuous Piecewise-Affine Based Motion Model for Image Animation

    Full text link
    Image animation aims to bring static images to life according to driving videos and create engaging visual content that can be used for various purposes such as animation, entertainment, and education. Recent unsupervised methods utilize affine and thin-plate spline transformations based on keypoints to transfer the motion in driving frames to the source image. However, limited by the expressive power of the transformations used, these methods always produce poor results when the gap between the motion in the driving frame and the source image is large. To address this issue, we propose to model motion from the source image to the driving frame in highly-expressive diffeomorphism spaces. Firstly, we introduce Continuous Piecewise-Affine based (CPAB) transformation to model the motion and present a well-designed inference algorithm to generate CPAB transformation from control keypoints. Secondly, we propose a SAM-guided keypoint semantic loss to further constrain the keypoint extraction process and improve the semantic consistency between the corresponding keypoints on the source and driving images. Finally, we design a structure alignment loss to align the structure-related features extracted from driving and generated images, thus helping the generator generate results that are more consistent with the driving action. Extensive experiments on four datasets demonstrate the effectiveness of our method against state-of-the-art competitors quantitatively and qualitatively. Code will be publicly available at: https://github.com/DevilPG/AAAI2024-CPABMM

    Uncertainty-Aware Consistency Regularization for Cross-Domain Semantic Segmentation

    Full text link
    Unsupervised domain adaptation (UDA) aims to adapt existing models of the source domain to a new target domain with only unlabeled data. Many adversarial-based UDA methods involve high-instability training and have to carefully tune the optimization procedure. Some non-adversarial UDA methods employ a consistency regularization on the target predictions of a student model and a teacher model under different perturbations, where the teacher shares the same architecture with the student and is updated by the exponential moving average of the student. However, these methods suffer from noticeable negative transfer resulting from either the error-prone discriminator network or the unreasonable teacher model. In this paper, we propose an uncertainty-aware consistency regularization method for cross-domain semantic segmentation. By exploiting the latent uncertainty information of the target samples, more meaningful and reliable knowledge from the teacher model can be transferred to the student model. In addition, we further reveal the reason why the current consistency regularization is often unstable in minimizing the distribution discrepancy. We also show that our method can effectively ease this issue by mining the most reliable and meaningful samples with a dynamic weighting scheme of consistency loss. Experiments demonstrate that the proposed method outperforms the state-of-the-art methods on two domain adaptation benchmarks, i.e.,i.e., GTAV \rightarrow Cityscapes and SYNTHIA \rightarrow Cityscapes

    Instance-Aware Domain Generalization for Face Anti-Spoofing

    Full text link
    Face anti-spoofing (FAS) based on domain generalization (DG) has been recently studied to improve the generalization on unseen scenarios. Previous methods typically rely on domain labels to align the distribution of each domain for learning domain-invariant representations. However, artificial domain labels are coarse-grained and subjective, which cannot reflect real domain distributions accurately. Besides, such domain-aware methods focus on domain-level alignment, which is not fine-grained enough to ensure that learned representations are insensitive to domain styles. To address these issues, we propose a novel perspective for DG FAS that aligns features on the instance level without the need for domain labels. Specifically, Instance-Aware Domain Generalization framework is proposed to learn the generalizable feature by weakening the features' sensitivity to instance-specific styles. Concretely, we propose Asymmetric Instance Adaptive Whitening to adaptively eliminate the style-sensitive feature correlation, boosting the generalization. Moreover, Dynamic Kernel Generator and Categorical Style Assembly are proposed to first extract the instance-specific features and then generate the style-diversified features with large style shifts, respectively, further facilitating the learning of style-insensitive features. Extensive experiments and analysis demonstrate the superiority of our method over state-of-the-art competitors. Code will be publicly available at https://github.com/qianyuzqy/IADG.Comment: Accepted to IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 202

    DMT: Dynamic Mutual Training for Semi-Supervised Learning

    Full text link
    Recent semi-supervised learning methods use pseudo supervision as core idea, especially self-training methods that generate pseudo labels. However, pseudo labels are unreliable. Self-training methods usually rely on single model prediction confidence to filter low-confidence pseudo labels, thus remaining high-confidence errors and wasting many low-confidence correct labels. In this paper, we point out it is difficult for a model to counter its own errors. Instead, leveraging inter-model disagreement between different models is a key to locate pseudo label errors. With this new viewpoint, we propose mutual training between two different models by a dynamically re-weighted loss function, called Dynamic Mutual Training (DMT). We quantify inter-model disagreement by comparing predictions from two different models to dynamically re-weight loss in training, where a larger disagreement indicates a possible error and corresponds to a lower loss value. Extensive experiments show that DMT achieves state-of-the-art performance in both image classification and semantic segmentation. Our codes are released at https://github.com/voldemortX/DST-CBC .Comment: Reformatte

    Context-Aware Mixup for Domain Adaptive Semantic Segmentation

    Get PDF
    Unsupervised domain adaptation (UDA) aims to adapt a model of the labeled source domain to an unlabeled target domain. Existing UDA-based semantic segmentation approaches always reduce the domain shifts in pixel level, feature level, and output level. However, almost all of them largely neglect the contextual dependency, which is generally shared across different domains, leading to less-desired performance. In this paper, we propose a novel Context-Aware Mixup (CAMix) framework for domain adaptive semantic segmentation, which exploits this important clue of context-dependency as explicit prior knowledge in a fully end-to-end trainable manner for enhancing the adaptability toward the target domain. Firstly, we present a contextual mask generation strategy by leveraging the accumulated spatial distributions and prior contextual relationships. The generated contextual mask is critical in this work and will guide the context-aware domain mixup on three different levels. Besides, provided the context knowledge, we introduce a significance-reweighted consistency loss to penalize the inconsistency between the mixed student prediction and the mixed teacher prediction, which alleviates the negative transfer of the adaptation, e.g., early performance degradation. Extensive experiments and analysis demonstrate the effectiveness of our method against the state-of-the-art approaches on widely-used UDA benchmarks.Comment: Accepted to IEEE Transactions on Circuits and Systems for Video Technology (TCSVT
    corecore