7 research outputs found
Flare-Aware Cross-modal Enhancement Network for Multi-spectral Vehicle Re-identification
Multi-spectral vehicle re-identification aims to address the challenge of
identifying vehicles in complex lighting conditions by incorporating
complementary visible and infrared information. However, in harsh environments,
the discriminative cues in RGB and NIR modalities are often lost due to strong
flares from vehicle lamps or sunlight, and existing multi-modal fusion methods
are limited in their ability to recover these important cues. To address this
problem, we propose a Flare-Aware Cross-modal Enhancement Network that
adaptively restores flare-corrupted RGB and NIR features with guidance from the
flare-immunized thermal infrared spectrum. First, to reduce the influence of
locally degraded appearance due to intense flare, we propose a Mutual Flare
Mask Prediction module to jointly obtain flare-corrupted masks in RGB and NIR
modalities in a self-supervised manner. Second, to use the flare-immunized TI
information to enhance the masked RGB and NIR, we propose a Flare-Aware
Cross-modal Enhancement module that adaptively guides feature extraction of
masked RGB and NIR spectra with prior flare-immunized knowledge from the TI
spectrum. Third, to extract common informative semantic information from RGB
and NIR, we propose an Inter-modality Consistency loss that enforces semantic
consistency between the two modalities. Finally, to evaluate the proposed
FACENet in handling intense flare, we introduce a new multi-spectral vehicle
re-ID dataset, called WMVEID863, with additional challenges such as motion
blur, significant background changes, and particularly intense flare
degradation. Comprehensive experiments on both the newly collected dataset and
public benchmark multi-spectral vehicle re-ID datasets demonstrate the superior
performance of the proposed FACENet compared to state-of-the-art methods,
especially in handling strong flares. The code and dataset will be released
soon
CMTR: Cross-modality Transformer for Visible-infrared Person Re-identification
Visible-infrared cross-modality person re-identification is a challenging
ReID task, which aims to retrieve and match the same identity's images between
the heterogeneous visible and infrared modalities. Thus, the core of this task
is to bridge the huge gap between these two modalities. The existing
convolutional neural network-based methods mainly face the problem of
insufficient perception of modalities' information, and can not learn good
discriminative modality-invariant embeddings for identities, which limits their
performance. To solve these problems, we propose a cross-modality
transformer-based method (CMTR) for the visible-infrared person
re-identification task, which can explicitly mine the information of each
modality and generate better discriminative features based on it. Specifically,
to capture modalities' characteristics, we design the novel modality
embeddings, which are fused with token embeddings to encode modalities'
information. Furthermore, to enhance representation of modality embeddings and
adjust matching embeddings' distribution, we propose a modality-aware
enhancement loss based on the learned modalities' information, reducing
intra-class distance and enlarging inter-class distance. To our knowledge, this
is the first work of applying transformer network to the cross-modality
re-identification task. We implement extensive experiments on the public
SYSU-MM01 and RegDB datasets, and our proposed CMTR model's performance
significantly surpasses existing outstanding CNN-based methods.Comment: 11 pages, 7 figures, 7 table
On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator
Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise