9 research outputs found
Ranking-based Adaptive Query Generation for DETRs in Crowded Pedestrian Detection
DEtection TRansformer (DETR) and its variants (DETRs) have been successfully
applied to crowded pedestrian detection, which achieved promising performance.
However, we find that, in different degrees of crowded scenes, the number of
DETRs' queries must be adjusted manually, otherwise, the performance would
degrade to varying degrees. In this paper, we first analyze the two current
query generation methods and summarize four guidelines for designing the
adaptive query generation method. Then, we propose Rank-based Adaptive Query
Generation (RAQG) to alleviate the problem. Specifically, we design a rank
prediction head that can predict the rank of the lowest confidence positive
training sample produced by the encoder. Based on the predicted rank, we design
an adaptive selection method that can adaptively select coarse detection
results produced by the encoder to generate queries. Moreover, to train the
rank prediction head better, we propose Soft Gradient L1 Loss. The gradient of
Soft Gradient L1 Loss is continuous, which can describe the relationship
between the loss value and the updated value of model parameters granularly.
Our method is simple and effective, which can be plugged into any DETRs to make
it query-adaptive in theory. The experimental results on Crowdhuman dataset and
Citypersons dataset show that our method can adaptively generate queries for
DETRs and achieve competitive results. Especially, our method achieves
state-of-the-art 39.4% MR on Crowdhuman dataset.Comment: 10 pages, 6 figure
Shape-centered Representation Learning for Visible-Infrared Person Re-identification
Current Visible-Infrared Person Re-Identification (VI-ReID) methods
prioritize extracting distinguishing appearance features, ignoring the natural
resistance of body shape against modality changes. Initially, we gauged the
discriminative potential of shapes by a straightforward concatenation of shape
and appearance features. However, two unresolved issues persist in the
utilization of shape features. One pertains to the dependence on auxiliary
models for shape feature extraction in the inference phase, along with the
errors in generated infrared shapes due to the intrinsic modality disparity.
The other issue involves the inadequately explored correlation between shape
and appearance features. To tackle the aforementioned challenges, we propose
the Shape-centered Representation Learning framework (ScRL), which focuses on
learning shape features and appearance features associated with shapes.
Specifically, we devise the Shape Feature Propagation (SFP), facilitating
direct extraction of shape features from original images with minimal
complexity costs during inference. To restitute inaccuracies in infrared body
shapes at the feature level, we present the Infrared Shape Restitution (ISR).
Furthermore, to acquire appearance features related to shape, we design the
Appearance Feature Enhancement (AFE), which accentuates identity-related
features while suppressing identity-unrelated features guided by shape
features. Extensive experiments are conducted to validate the effectiveness of
the proposed ScRL. Achieving remarkable results, the Rank-1 (mAP) accuracy
attains 76.1%, 71.2%, 92.4% (72.6%, 52.9%, 86.7%) on the SYSU-MM01, HITSZ-VCM,
RegDB datasets respectively, outperforming existing state-of-the-art methods
Peer is Your Pillar: A Data-unbalanced Conditional GANs for Few-shot Image Generation
Few-shot image generation aims to train generative models using a small
number of training images. When there are few images available for training
(e.g. 10 images), Learning From Scratch (LFS) methods often generate images
that closely resemble the training data while Transfer Learning (TL) methods
try to improve performance by leveraging prior knowledge from GANs pre-trained
on large-scale datasets. However, current TL methods may not allow for
sufficient control over the degree of knowledge preservation from the source
model, making them unsuitable for setups where the source and target domains
are not closely related. To address this, we propose a novel pipeline called
Peer is your Pillar (PIP), which combines a target few-shot dataset with a peer
dataset to create a data-unbalanced conditional generation. Our approach
includes a class embedding method that separates the class space from the
latent space, and we use a direction loss based on pre-trained CLIP to improve
image diversity. Experiments on various few-shot datasets demonstrate the
advancement of the proposed PIP, especially reduces the training requirements
of few-shot image generation.Comment: Under Revie
PRO-Face S: Privacy-preserving Reversible Obfuscation of Face Images via Secure Flow
This paper proposes a novel paradigm for facial privacy protection that
unifies multiple characteristics including anonymity, diversity, reversibility
and security within a single lightweight framework. We name it PRO-Face S,
short for Privacy-preserving Reversible Obfuscation of Face images via Secure
flow-based model. In the framework, an Invertible Neural Network (INN) is
utilized to process the input image along with its pre-obfuscated form, and
generate the privacy protected image that visually approximates to the
pre-obfuscated one, thus ensuring privacy. The pre-obfuscation applied can be
in diversified form with different strengths and styles specified by users.
Along protection, a secret key is injected into the network such that the
original image can only be recovered from the protection image via the same
model given the correct key provided. Two modes of image recovery are devised
to deal with malicious recovery attempts in different scenarios. Finally,
extensive experiments conducted on three public image datasets demonstrate the
superiority of the proposed framework over multiple state-of-the-art
approaches
Atomic number prior guided network for prohibited items detection from heavily cluttered X-ray imagery
Prohibited item detection in X-ray images is an effective measure to maintain public safety. Recent prohibited item detection methods based on deep learning has achieved impressive performance. Some methods improve prohibited item detection performance by introducing prior knowledge of prohibited items, such as the edge and size of an object. However, items within baggage are often placed randomly, resulting in cluttered X-ray images, which can seriously affect the correctness and effectiveness of prior knowledge. In particular, we find that different material items in X-ray images have clear distinctions according to their atomic number Z information, which is vital to suppress the interference of irrelevant background information by mining material cues. Inspired by this observation, in this paper, we combined the atomic number Z feature and proposed a novel atomic number Z Prior Guided Network (ZPGNet) to detect prohibited objects from heavily cluttered X-ray images. Specifically, we propose a Material Activation (MA) module that cross-scale flows the atomic number Z information through the network to mine material clues and reduce irrelevant information interference in detecting prohibited items. However, collecting atomic number images requires much labor, increasing costs. Therefore, we propose a method to automatically generate atomic number Z images by exploring the color information of X-ray images, which significantly reduces the manual acquisition cost. Extensive experiments demonstrate that our method can accurately and robustly detect prohibited items from heavily cluttered X-ray images. Furthermore, we extensively evaluate our method on HiXray and OPIXray, and the best result is 2.1% mAP50 higher than the state-of-the-art models on HiXray
Artificial Neural Networks and Deep Learning Techniques Applied to Radar Target Detection: A Review
Radar target detection (RTD) is a fundamental but important process of the radar system, which is designed to differentiate and measure targets from a complex background. Deep learning methods have gained great attention currently and have turned out to be feasible solutions in radar signal processing. Compared with the conventional RTD methods, deep learning-based methods can extract features automatically and yield more accurate results. Applying deep learning to RTD is considered as a novel concept. In this paper, we review the applications of deep learning in the field of RTD and summarize the possible limitations. This work is timely due to the increasing number of research works published in recent years. We hope that this survey will provide guidelines for future studies and applications of deep learning in RTD and related areas of radar signal processing
Artificial Neural Networks and Deep Learning Techniques Applied to Radar Target Detection: A Review
Radar target detection (RTD) is a fundamental but important process of the radar system, which is designed to differentiate and measure targets from a complex background. Deep learning methods have gained great attention currently and have turned out to be feasible solutions in radar signal processing. Compared with the conventional RTD methods, deep learning-based methods can extract features automatically and yield more accurate results. Applying deep learning to RTD is considered as a novel concept. In this paper, we review the applications of deep learning in the field of RTD and summarize the possible limitations. This work is timely due to the increasing number of research works published in recent years. We hope that this survey will provide guidelines for future studies and applications of deep learning in RTD and related areas of radar signal processing