9 research outputs found

    Ranking-based Adaptive Query Generation for DETRs in Crowded Pedestrian Detection

    Full text link
    DEtection TRansformer (DETR) and its variants (DETRs) have been successfully applied to crowded pedestrian detection, which achieved promising performance. However, we find that, in different degrees of crowded scenes, the number of DETRs' queries must be adjusted manually, otherwise, the performance would degrade to varying degrees. In this paper, we first analyze the two current query generation methods and summarize four guidelines for designing the adaptive query generation method. Then, we propose Rank-based Adaptive Query Generation (RAQG) to alleviate the problem. Specifically, we design a rank prediction head that can predict the rank of the lowest confidence positive training sample produced by the encoder. Based on the predicted rank, we design an adaptive selection method that can adaptively select coarse detection results produced by the encoder to generate queries. Moreover, to train the rank prediction head better, we propose Soft Gradient L1 Loss. The gradient of Soft Gradient L1 Loss is continuous, which can describe the relationship between the loss value and the updated value of model parameters granularly. Our method is simple and effective, which can be plugged into any DETRs to make it query-adaptive in theory. The experimental results on Crowdhuman dataset and Citypersons dataset show that our method can adaptively generate queries for DETRs and achieve competitive results. Especially, our method achieves state-of-the-art 39.4% MR on Crowdhuman dataset.Comment: 10 pages, 6 figure

    Shape-centered Representation Learning for Visible-Infrared Person Re-identification

    Full text link
    Current Visible-Infrared Person Re-Identification (VI-ReID) methods prioritize extracting distinguishing appearance features, ignoring the natural resistance of body shape against modality changes. Initially, we gauged the discriminative potential of shapes by a straightforward concatenation of shape and appearance features. However, two unresolved issues persist in the utilization of shape features. One pertains to the dependence on auxiliary models for shape feature extraction in the inference phase, along with the errors in generated infrared shapes due to the intrinsic modality disparity. The other issue involves the inadequately explored correlation between shape and appearance features. To tackle the aforementioned challenges, we propose the Shape-centered Representation Learning framework (ScRL), which focuses on learning shape features and appearance features associated with shapes. Specifically, we devise the Shape Feature Propagation (SFP), facilitating direct extraction of shape features from original images with minimal complexity costs during inference. To restitute inaccuracies in infrared body shapes at the feature level, we present the Infrared Shape Restitution (ISR). Furthermore, to acquire appearance features related to shape, we design the Appearance Feature Enhancement (AFE), which accentuates identity-related features while suppressing identity-unrelated features guided by shape features. Extensive experiments are conducted to validate the effectiveness of the proposed ScRL. Achieving remarkable results, the Rank-1 (mAP) accuracy attains 76.1%, 71.2%, 92.4% (72.6%, 52.9%, 86.7%) on the SYSU-MM01, HITSZ-VCM, RegDB datasets respectively, outperforming existing state-of-the-art methods

    Peer is Your Pillar: A Data-unbalanced Conditional GANs for Few-shot Image Generation

    Full text link
    Few-shot image generation aims to train generative models using a small number of training images. When there are few images available for training (e.g. 10 images), Learning From Scratch (LFS) methods often generate images that closely resemble the training data while Transfer Learning (TL) methods try to improve performance by leveraging prior knowledge from GANs pre-trained on large-scale datasets. However, current TL methods may not allow for sufficient control over the degree of knowledge preservation from the source model, making them unsuitable for setups where the source and target domains are not closely related. To address this, we propose a novel pipeline called Peer is your Pillar (PIP), which combines a target few-shot dataset with a peer dataset to create a data-unbalanced conditional generation. Our approach includes a class embedding method that separates the class space from the latent space, and we use a direction loss based on pre-trained CLIP to improve image diversity. Experiments on various few-shot datasets demonstrate the advancement of the proposed PIP, especially reduces the training requirements of few-shot image generation.Comment: Under Revie

    PRO-Face S: Privacy-preserving Reversible Obfuscation of Face Images via Secure Flow

    Full text link
    This paper proposes a novel paradigm for facial privacy protection that unifies multiple characteristics including anonymity, diversity, reversibility and security within a single lightweight framework. We name it PRO-Face S, short for Privacy-preserving Reversible Obfuscation of Face images via Secure flow-based model. In the framework, an Invertible Neural Network (INN) is utilized to process the input image along with its pre-obfuscated form, and generate the privacy protected image that visually approximates to the pre-obfuscated one, thus ensuring privacy. The pre-obfuscation applied can be in diversified form with different strengths and styles specified by users. Along protection, a secret key is injected into the network such that the original image can only be recovered from the protection image via the same model given the correct key provided. Two modes of image recovery are devised to deal with malicious recovery attempts in different scenarios. Finally, extensive experiments conducted on three public image datasets demonstrate the superiority of the proposed framework over multiple state-of-the-art approaches

    Atomic number prior guided network for prohibited items detection from heavily cluttered X-ray imagery

    Get PDF
    Prohibited item detection in X-ray images is an effective measure to maintain public safety. Recent prohibited item detection methods based on deep learning has achieved impressive performance. Some methods improve prohibited item detection performance by introducing prior knowledge of prohibited items, such as the edge and size of an object. However, items within baggage are often placed randomly, resulting in cluttered X-ray images, which can seriously affect the correctness and effectiveness of prior knowledge. In particular, we find that different material items in X-ray images have clear distinctions according to their atomic number Z information, which is vital to suppress the interference of irrelevant background information by mining material cues. Inspired by this observation, in this paper, we combined the atomic number Z feature and proposed a novel atomic number Z Prior Guided Network (ZPGNet) to detect prohibited objects from heavily cluttered X-ray images. Specifically, we propose a Material Activation (MA) module that cross-scale flows the atomic number Z information through the network to mine material clues and reduce irrelevant information interference in detecting prohibited items. However, collecting atomic number images requires much labor, increasing costs. Therefore, we propose a method to automatically generate atomic number Z images by exploring the color information of X-ray images, which significantly reduces the manual acquisition cost. Extensive experiments demonstrate that our method can accurately and robustly detect prohibited items from heavily cluttered X-ray images. Furthermore, we extensively evaluate our method on HiXray and OPIXray, and the best result is 2.1% mAP50 higher than the state-of-the-art models on HiXray

    Artificial Neural Networks and Deep Learning Techniques Applied to Radar Target Detection: A Review

    No full text
    Radar target detection (RTD) is a fundamental but important process of the radar system, which is designed to differentiate and measure targets from a complex background. Deep learning methods have gained great attention currently and have turned out to be feasible solutions in radar signal processing. Compared with the conventional RTD methods, deep learning-based methods can extract features automatically and yield more accurate results. Applying deep learning to RTD is considered as a novel concept. In this paper, we review the applications of deep learning in the field of RTD and summarize the possible limitations. This work is timely due to the increasing number of research works published in recent years. We hope that this survey will provide guidelines for future studies and applications of deep learning in RTD and related areas of radar signal processing

    Artificial Neural Networks and Deep Learning Techniques Applied to Radar Target Detection: A Review

    No full text
    Radar target detection (RTD) is a fundamental but important process of the radar system, which is designed to differentiate and measure targets from a complex background. Deep learning methods have gained great attention currently and have turned out to be feasible solutions in radar signal processing. Compared with the conventional RTD methods, deep learning-based methods can extract features automatically and yield more accurate results. Applying deep learning to RTD is considered as a novel concept. In this paper, we review the applications of deep learning in the field of RTD and summarize the possible limitations. This work is timely due to the increasing number of research works published in recent years. We hope that this survey will provide guidelines for future studies and applications of deep learning in RTD and related areas of radar signal processing
    corecore