6 research outputs found

    Parameter-Efficient Person Re-identification in the 3D Space

    Full text link
    People live in a 3D world. However, existing works on person re-identification (re-id) mostly consider the semantic representation learning in a 2D space, intrinsically limiting the understanding of people. In this work, we address this limitation by exploring the prior knowledge of the 3D body structure. Specifically, we project 2D images to a 3D space and introduce a novel parameter-efficient Omni-scale Graph Network (OG-Net) to learn the pedestrian representation directly from 3D point clouds. OG-Net effectively exploits the local information provided by sparse 3D points and takes advantage of the structure and appearance information in a coherent manner. With the help of 3D geometry information, we can learn a new type of deep re-id feature free from noisy variants, such as scale and viewpoint. To our knowledge, we are among the first attempts to conduct person re-identification in the 3D space. We demonstrate through extensive experiments that the proposed method (1) eases the matching difficulty in the traditional 2D space, (2) exploits the complementary information of 2D appearance and 3D structure, (3) achieves competitive results with limited parameters on four large-scale person re-id datasets, and (4) has good scalability to unseen datasets.Comment: The code is available at https://github.com/layumi/person-reid-3

    Learning Clothing and Pose Invariant 3D Shape Representation for Long-Term Person Re-Identification

    Full text link
    Long-Term Person Re-Identification (LT-ReID) has become increasingly crucial in computer vision and biometrics. In this work, we aim to extend LT-ReID beyond pedestrian recognition to include a wider range of real-world human activities while still accounting for cloth-changing scenarios over large time gaps. This setting poses additional challenges due to the geometric misalignment and appearance ambiguity caused by the diversity of human pose and clothing. To address these challenges, we propose a new approach 3DInvarReID for (i) disentangling identity from non-identity components (pose, clothing shape, and texture) of 3D clothed humans, and (ii) reconstructing accurate 3D clothed body shapes and learning discriminative features of naked body shapes for person ReID in a joint manner. To better evaluate our study of LT-ReID, we collect a real-world dataset called CCDA, which contains a wide variety of human activities and clothing changes. Experimentally, we show the superior performance of our approach for person ReID.Comment: 10 pages, 7 figures, accepted by ICCV 202

    Behavioral Intention Prediction in Driving Scenes: A Survey

    Full text link
    In the driving scene, the road agents usually conduct frequent interactions and intention understanding of the surroundings. Ego-agent (each road agent itself) predicts what behavior will be engaged by other road users all the time and expects a shared and consistent understanding for safe movement. Behavioral Intention Prediction (BIP) simulates such a human consideration process and fulfills the early prediction of specific behaviors. Similar to other prediction tasks, such as trajectory prediction, data-driven deep learning methods have taken the primary pipeline in research. The rapid development of BIP inevitably leads to new issues and challenges. To catalyze future research, this work provides a comprehensive review of BIP from the available datasets, key factors and challenges, pedestrian-centric and vehicle-centric BIP approaches, and BIP-aware applications. Based on the investigation, data-driven deep learning approaches have become the primary pipelines. The behavioral intention types are still monotonous in most current datasets and methods (e.g., Crossing (C) and Not Crossing (NC) for pedestrians and Lane Changing (LC) for vehicles) in this field. In addition, for the safe-critical scenarios (e.g., near-crashing situations), current research is limited. Through this investigation, we identify open issues in behavioral intention prediction and suggest possible insights for future research.Comment: 254 reference

    From Robust to Generalizable Representation Learning for Person Re-Identification

    Get PDF
    Person Re-Identification (ReID) is a retrieval task across non-overlapping cameras. Given a person-of-interest as a query, the goal of ReID is to determine whether this person has appeared in another place at a distinct time captured by a different camera, or even the same camera at a different time instant. ReID is considered a zero-shot learning task because the identities present in the training data may not necessarily overlap with those in the test data within the label space. This fundamental characteristic adds a layer of complexity to the task, making ReID a highly challenging representation learning problem. This thesis addresses the problem of learning generalizable yet discriminative representations with the following solutions: Chapter 3: Noisy and unrepresentative frames in automatically generated object bounding boxes from video sequences cause significant challenges in learning discriminative representations in video ReID. Most existing methods tackle this problem by assessing the importance of video frames according to their local part alignments or global appearance correlations separately. However, given the diverse and unknown sources of noise that usually co-exist in captured video data, existing methods have not been sufficiently effective. In this chapter, we explore both local alignments and global correlations jointly, with further consideration of their mutual reinforcement, to better assemble complementary discriminative ReID information within all relevant frames in video tracklets. We propose a model named Local-Global Associative Assembling (LOGA). Specifically, we concurrently optimize a Local Aligned Quality (LAQ) module that distinguishes the quality of each frame based on local alignments, and a Global Correlated Quality (GCQ) module that estimates global appearance correlations. With a locally-assembled global appearance prototype, we associate LAQ and GCQ to exploit their mutual complement. Chapter 4: While deep learning has significantly improved ReID model accuracy under the Independent and Identical Distribution (IID) assumption, it has become clear that such models degrade notably when applied to an unseen novel domain due to unpredictable domain shifts. Contemporary Domain Generalizable ReID models struggle to learn domain-invariant representations solely through training on an instance classification objective. We consider that deep learning models are heavily influenced and thus biased towards domain-specific characteristics, such as background clutter, scale, and viewpoint variations, limiting the generalizability of the learned model. We hypothesize that pedestrians are domain-invariant as they share the same structural characteristics. To enable the ReID model to be less domain-specific, we introduce a Primary-Auxiliary Objectives Association (PAOA) model that guides model learning of the primary ReID instance classification objective by a concurrent auxiliary learning objective on weakly labeled pedestrian saliency detection. To solve the problem of conflicting optimization criteria in the model parameter space between the two learning objectives, PAOA calibrates the loss gradients of the auxiliary task towards the primary learning task gradients. Benefiting from the harmonious multitask learning design, our model can be extended with the recent test-time diagram to form the PAOA+, which performs on-the-fly optimization against the auxiliary objective to maximize the model’s generative capacity in the test target domain. Experiments demonstrate the superiority of the proposed PAOA model. Chapter 5: In this chapter, we propose a Feature-Distribution Perturbation and Calibration (PECA) method to derive generic feature representations for person ReID, which are not only discriminative across cameras but also agnostic and deployable to arbitrary unseen target domains. Specifically, we perform per-domain feature-distribution perturbation to prevent the model from overfitting to the domain-biased distribution of each source (seen) domain by enforcing feature invariance to distribution shifts caused by perturbation. Complementarily, we design a global calibration mechanism to align feature distributions across all source domains to improve the model’s generalization capacity by eliminating domain bias. These local perturbation and global calibration processes are conducted simultaneously, sharing the same principle of avoiding overfitting by regularization on the perturbed and original distributions, respectively. Extensive experiments conducted on eight person ReID datasets show that the proposed PECA model outperformed state-of-the-art competitors by significant margins. Chapter 6: Existing Domain Generalizable ReID methods explore feature disentanglement to learn a compact generic feature space by eliminating domain-specific knowledge. Such methods not only sacrifice discrimination in target domains but also limit the model’s robustness against per-identity appearance variations across views, an inherent characteristic of ReID. In this chapter, we formulate a Cross-Domain Variations Mining (CDVM) model to simultaneously explore explicit domain-specific knowledge while advancing generalizable representation learning. Our key insight is that cross-domain style variations need to be explicitly modeled to represent per-identity cross-view appearance changes. CDVM retains the model’s robustness against cross-view style variations that reflect the specific characteristics of different domains while maximizing the learning of a globally generalizable (invariant) representation. To this end, we propose utilizing cross-domain consensus to learn a domain-agnostic generic prototype. This prototype is then refined by incorporating cross-domain style variations, thereby achieving cross-view feature augmentation. Additionally, we enhance the discriminative power of the augmented representation by formulating an identity attribute constraint to emphasize the importance of individual attributes while maintaining overall consistency across all pedestrians. Extensive experiments validate that the proposed CDVM model outperforms existing state-of-the-art methods by significant margins. These four solutions jointly solve the problem of domain distribution shift for out-of-distribution (OOD) data by enabling the network to derive robust yet generalizable representations for identities, thereby facilitating the differentiation of inter-class decision boundaries and improving matching accuracy among query and gallery instances

    Parameter-Efficient Person Re-Identification in the 3D Space

    No full text
    10.1109/TNNLS.2022.3214834IEEE Transactions on Neural Networks and Learning Systems1-1
    corecore