929 research outputs found
An Energy-Based Prior for Generative Saliency
We propose a novel energy-based prior for generative saliency prediction,
where the latent variables follow an informative energy-based prior. Both the
saliency generator and the energy-based prior are jointly trained via Markov
chain Monte Carlo-based maximum likelihood estimation, in which the sampling
from the intractable posterior and prior distributions of the latent variables
are performed by Langevin dynamics. With the generative saliency model, we can
obtain a pixel-wise uncertainty map from an image, indicating model confidence
in the saliency prediction. Different from existing generative models, which
define the prior distribution of the latent variable as a simple isotropic
Gaussian distribution, our model uses an energy-based informative prior which
can be more expressive in capturing the latent space of the data. With the
informative energy-based prior, we extend the Gaussian distribution assumption
of generative models to achieve a more representative distribution of the
latent space, leading to more reliable uncertainty estimation. We apply the
proposed frameworks to both RGB and RGB-D salient object detection tasks with
both transformer and convolutional neural network backbones. Experimental
results show that our generative saliency model with an energy-based prior can
achieve not only accurate saliency predictions but also reliable uncertainty
maps that are consistent with human perception
Mutual Information Regularization for Weakly-supervised RGB-D Salient Object Detection
In this paper, we present a weakly-supervised RGB-D salient object detection
model via scribble supervision. Specifically, as a multimodal learning task, we
focus on effective multimodal representation learning via inter-modal mutual
information regularization. In particular, following the principle of
disentangled representation learning, we introduce a mutual information upper
bound with a mutual information minimization regularizer to encourage the
disentangled representation of each modality for salient object detection.
Based on our multimodal representation learning framework, we introduce an
asymmetric feature extractor for our multimodal data, which is proven more
effective than the conventional symmetric backbone setting. We also introduce
multimodal variational auto-encoder as stochastic prediction refinement
techniques, which takes pseudo labels from the first training stage as
supervision and generates refined prediction. Experimental results on benchmark
RGB-D salient object detection datasets verify both effectiveness of our
explicit multimodal disentangled representation learning method and the
stochastic prediction refinement strategy, achieving comparable performance
with the state-of-the-art fully supervised models. Our code and data are
available at: https://github.com/baneitixiaomai/MIRV.Comment: IEEE Transactions on Circuits and Systems for Video Technology 202
GLENet: Boosting 3D Object Detectors with Generative Label Uncertainty Estimation
The inherent ambiguity in ground-truth annotations of 3D bounding boxes
caused by occlusions, signal missing, or manual annotation errors can confuse
deep 3D object detectors during training, thus deteriorating the detection
accuracy. However, existing methods overlook such issues to some extent and
treat the labels as deterministic. In this paper, we formulate the label
uncertainty problem as the diversity of potentially plausible bounding boxes of
objects, then propose GLENet, a generative framework adapted from conditional
variational autoencoders, to model the one-to-many relationship between a
typical 3D object and its potential ground-truth bounding boxes with latent
variables. The label uncertainty generated by GLENet is a plug-and-play module
and can be conveniently integrated into existing deep 3D detectors to build
probabilistic detectors and supervise the learning of the localization
uncertainty. Besides, we propose an uncertainty-aware quality estimator
architecture in probabilistic detectors to guide the training of IoU-branch
with predicted localization uncertainty. We incorporate the proposed methods
into various popular base 3D detectors and demonstrate significant and
consistent performance gains on both KITTI and Waymo benchmark datasets.
Especially, the proposed GLENet-VR outperforms all published LiDAR-based
approaches by a large margin and ranks among single-modal methods on
the challenging KITTI test set. We will make the source code and pre-trained
models publicly available
PaintSeg: Training-free Segmentation via Painting
The paper introduces PaintSeg, a new unsupervised method for segmenting
objects without any training. We propose an adversarial masked contrastive
painting (AMCP) process, which creates a contrast between the original image
and a painted image in which a masked area is painted using off-the-shelf
generative models. During the painting process, inpainting and outpainting are
alternated, with the former masking the foreground and filling in the
background, and the latter masking the background while recovering the missing
part of the foreground object. Inpainting and outpainting, also referred to as
I-step and O-step, allow our method to gradually advance the target
segmentation mask toward the ground truth without supervision or training.
PaintSeg can be configured to work with a variety of prompts, e.g. coarse
masks, boxes, scribbles, and points. Our experimental results demonstrate that
PaintSeg outperforms existing approaches in coarse mask-prompt, box-prompt, and
point-prompt segmentation tasks, providing a training-free solution suitable
for unsupervised segmentation
MultiNet with Transformers: A Model for Cancer Diagnosis Using Images
Cancer is a leading cause of death in many countries. An early diagnosis of
cancer based on biomedical imaging ensures effective treatment and a better
prognosis. However, biomedical imaging presents challenges to both clinical
institutions and researchers. Physiological anomalies are often characterized
by slight abnormalities in individual cells or tissues, making them difficult
to detect visually. Traditionally, anomalies are diagnosed by radiologists and
pathologists with extensive training. This procedure, however, demands the
participation of professionals and incurs a substantial cost. The cost makes
large-scale biological image classification impractical. In this study, we
provide unique deep neural network designs for multiclass classification of
medical images, in particular cancer images. We incorporated transformers into
a multiclass framework to take advantage of data-gathering capability and
perform more accurate classifications. We evaluated models on publicly
accessible datasets using various measures to ensure the reliability of the
models. Extensive assessment metrics suggest this method can be used for a
multitude of classification tasks
UCDFormer: Unsupervised Change Detection Using a Transformer-driven Image Translation
Change detection (CD) by comparing two bi-temporal images is a crucial task
in remote sensing. With the advantages of requiring no cumbersome labeled
change information, unsupervised CD has attracted extensive attention in the
community. However, existing unsupervised CD approaches rarely consider the
seasonal and style differences incurred by the illumination and atmospheric
conditions in multi-temporal images. To this end, we propose a change detection
with domain shift setting for remote sensing images. Furthermore, we present a
novel unsupervised CD method using a light-weight transformer, called
UCDFormer. Specifically, a transformer-driven image translation composed of a
light-weight transformer and a domain-specific affinity weight is first
proposed to mitigate domain shift between two images with real-time efficiency.
After image translation, we can generate the difference map between the
translated before-event image and the original after-event image. Then, a novel
reliable pixel extraction module is proposed to select significantly
changed/unchanged pixel positions by fusing the pseudo change maps of fuzzy
c-means clustering and adaptive threshold. Finally, a binary change map is
obtained based on these selected pixel pairs and a binary classifier.
Experimental results on different unsupervised CD tasks with seasonal and style
changes demonstrate the effectiveness of the proposed UCDFormer. For example,
compared with several other related methods, UCDFormer improves performance on
the Kappa coefficient by more than 12\%. In addition, UCDFormer achieves
excellent performance for earthquake-induced landslide detection when
considering large-scale applications. The code is available at
\url{https://github.com/zhu-xlab/UCDFormer}Comment: 16 pages, 7 figures, IEEE Transactions on Geoscience and Remote
Sensin
- …