996 research outputs found

    Learning Domain Invariant Representations for Generalizable Person Re-Identification

    Full text link
    Generalizable person Re-Identification (ReID) has attracted growing attention in recent computer vision community. In this work, we construct a structural causal model among identity labels, identity-specific factors (clothes/shoes color etc), and domain-specific factors (background, viewpoints etc). According to the causal analysis, we propose a novel Domain Invariant Representation Learning for generalizable person Re-Identification (DIR-ReID) framework. Specifically, we first propose to disentangle the identity-specific and domain-specific feature spaces, based on which we propose an effective algorithmic implementation for backdoor adjustment, essentially serving as a causal intervention towards the SCM. Extensive experiments have been conducted, showing that DIR-ReID outperforms state-of-the-art methods on large-scale domain generalization ReID benchmarks

    From Robust to Generalizable Representation Learning for Person Re-Identification

    Get PDF
    Person Re-Identification (ReID) is a retrieval task across non-overlapping cameras. Given a person-of-interest as a query, the goal of ReID is to determine whether this person has appeared in another place at a distinct time captured by a different camera, or even the same camera at a different time instant. ReID is considered a zero-shot learning task because the identities present in the training data may not necessarily overlap with those in the test data within the label space. This fundamental characteristic adds a layer of complexity to the task, making ReID a highly challenging representation learning problem. This thesis addresses the problem of learning generalizable yet discriminative representations with the following solutions: Chapter 3: Noisy and unrepresentative frames in automatically generated object bounding boxes from video sequences cause significant challenges in learning discriminative representations in video ReID. Most existing methods tackle this problem by assessing the importance of video frames according to their local part alignments or global appearance correlations separately. However, given the diverse and unknown sources of noise that usually co-exist in captured video data, existing methods have not been sufficiently effective. In this chapter, we explore both local alignments and global correlations jointly, with further consideration of their mutual reinforcement, to better assemble complementary discriminative ReID information within all relevant frames in video tracklets. We propose a model named Local-Global Associative Assembling (LOGA). Specifically, we concurrently optimize a Local Aligned Quality (LAQ) module that distinguishes the quality of each frame based on local alignments, and a Global Correlated Quality (GCQ) module that estimates global appearance correlations. With a locally-assembled global appearance prototype, we associate LAQ and GCQ to exploit their mutual complement. Chapter 4: While deep learning has significantly improved ReID model accuracy under the Independent and Identical Distribution (IID) assumption, it has become clear that such models degrade notably when applied to an unseen novel domain due to unpredictable domain shifts. Contemporary Domain Generalizable ReID models struggle to learn domain-invariant representations solely through training on an instance classification objective. We consider that deep learning models are heavily influenced and thus biased towards domain-specific characteristics, such as background clutter, scale, and viewpoint variations, limiting the generalizability of the learned model. We hypothesize that pedestrians are domain-invariant as they share the same structural characteristics. To enable the ReID model to be less domain-specific, we introduce a Primary-Auxiliary Objectives Association (PAOA) model that guides model learning of the primary ReID instance classification objective by a concurrent auxiliary learning objective on weakly labeled pedestrian saliency detection. To solve the problem of conflicting optimization criteria in the model parameter space between the two learning objectives, PAOA calibrates the loss gradients of the auxiliary task towards the primary learning task gradients. Benefiting from the harmonious multitask learning design, our model can be extended with the recent test-time diagram to form the PAOA+, which performs on-the-fly optimization against the auxiliary objective to maximize the model’s generative capacity in the test target domain. Experiments demonstrate the superiority of the proposed PAOA model. Chapter 5: In this chapter, we propose a Feature-Distribution Perturbation and Calibration (PECA) method to derive generic feature representations for person ReID, which are not only discriminative across cameras but also agnostic and deployable to arbitrary unseen target domains. Specifically, we perform per-domain feature-distribution perturbation to prevent the model from overfitting to the domain-biased distribution of each source (seen) domain by enforcing feature invariance to distribution shifts caused by perturbation. Complementarily, we design a global calibration mechanism to align feature distributions across all source domains to improve the model’s generalization capacity by eliminating domain bias. These local perturbation and global calibration processes are conducted simultaneously, sharing the same principle of avoiding overfitting by regularization on the perturbed and original distributions, respectively. Extensive experiments conducted on eight person ReID datasets show that the proposed PECA model outperformed state-of-the-art competitors by significant margins. Chapter 6: Existing Domain Generalizable ReID methods explore feature disentanglement to learn a compact generic feature space by eliminating domain-specific knowledge. Such methods not only sacrifice discrimination in target domains but also limit the model’s robustness against per-identity appearance variations across views, an inherent characteristic of ReID. In this chapter, we formulate a Cross-Domain Variations Mining (CDVM) model to simultaneously explore explicit domain-specific knowledge while advancing generalizable representation learning. Our key insight is that cross-domain style variations need to be explicitly modeled to represent per-identity cross-view appearance changes. CDVM retains the model’s robustness against cross-view style variations that reflect the specific characteristics of different domains while maximizing the learning of a globally generalizable (invariant) representation. To this end, we propose utilizing cross-domain consensus to learn a domain-agnostic generic prototype. This prototype is then refined by incorporating cross-domain style variations, thereby achieving cross-view feature augmentation. Additionally, we enhance the discriminative power of the augmented representation by formulating an identity attribute constraint to emphasize the importance of individual attributes while maintaining overall consistency across all pedestrians. Extensive experiments validate that the proposed CDVM model outperforms existing state-of-the-art methods by significant margins. These four solutions jointly solve the problem of domain distribution shift for out-of-distribution (OOD) data by enabling the network to derive robust yet generalizable representations for identities, thereby facilitating the differentiation of inter-class decision boundaries and improving matching accuracy among query and gallery instances

    Domain Generalization for Medical Image Analysis: A Survey

    Full text link
    Medical Image Analysis (MedIA) has become an essential tool in medicine and healthcare, aiding in disease diagnosis, prognosis, and treatment planning, and recent successes in deep learning (DL) have made significant contributions to its advances. However, DL models for MedIA remain challenging to deploy in real-world situations, failing for generalization under the distributional gap between training and testing samples, known as a distribution shift problem. Researchers have dedicated their efforts to developing various DL methods to adapt and perform robustly on unknown and out-of-distribution data distributions. This paper comprehensively reviews domain generalization studies specifically tailored for MedIA. We provide a holistic view of how domain generalization techniques interact within the broader MedIA system, going beyond methodologies to consider the operational implications on the entire MedIA workflow. Specifically, we categorize domain generalization methods into data-level, feature-level, model-level, and analysis-level methods. We show how those methods can be used in various stages of the MedIA workflow with DL equipped from data acquisition to model prediction and analysis. Furthermore, we include benchmark datasets and applications used to evaluate these approaches and analyze the strengths and weaknesses of various methods, unveiling future research opportunities
    • …
    corecore