358 research outputs found
Robust Prototypical Few-Shot Organ Segmentation with Regularized Neural-ODEs
Despite the tremendous progress made by deep learning models in image
semantic segmentation, they typically require large annotated examples, and
increasing attention is being diverted to problem settings like Few-Shot
Learning (FSL) where only a small amount of annotation is needed for
generalisation to novel classes. This is especially seen in medical domains
where dense pixel-level annotations are expensive to obtain. In this paper, we
propose Regularized Prototypical Neural Ordinary Differential Equation
(R-PNODE), a method that leverages intrinsic properties of Neural-ODEs,
assisted and enhanced by additional cluster and consistency losses to perform
Few-Shot Segmentation (FSS) of organs. R-PNODE constrains support and query
features from the same classes to lie closer in the representation space
thereby improving the performance over the existing Convolutional Neural
Network (CNN) based FSS methods. We further demonstrate that while many
existing Deep CNN based methods tend to be extremely vulnerable to adversarial
attacks, R-PNODE exhibits increased adversarial robustness for a wide array of
these attacks. We experiment with three publicly available multi-organ
segmentation datasets in both in-domain and cross-domain FSS settings to
demonstrate the efficacy of our method. In addition, we perform experiments
with seven commonly used adversarial attacks in various settings to demonstrate
R-PNODE's robustness. R-PNODE outperforms the baselines for FSS by significant
margins and also shows superior performance for a wide array of attacks varying
in intensity and design
Look, Cast and Mold: Learning 3D Shape Manifold from Single-view Synthetic Data
Inferring the stereo structure of objects in the real world is a challenging
yet practical task. To equip deep models with this ability usually requires
abundant 3D supervision which is hard to acquire. It is promising that we can
simply benefit from synthetic data, where pairwise ground-truth is easy to
access. Nevertheless, the domain gaps are nontrivial considering the variant
texture, shape and context. To overcome these difficulties, we propose a
Visio-Perceptual Adaptive Network for single-view 3D reconstruction, dubbed
VPAN. To generalize the model towards a real scenario, we propose to fulfill
several aspects: (1) Look: visually incorporate spatial structure from the
single view to enhance the expressiveness of representation; (2) Cast:
perceptually align the 2D image features to the 3D shape priors with
cross-modal semantic contrastive mapping; (3) Mold: reconstruct stereo-shape of
target by transforming embeddings into the desired manifold. Extensive
experiments on several benchmarks demonstrate the effectiveness and robustness
of the proposed method in learning the 3D shape manifold from synthetic data
via a single-view. The proposed method outperforms state-of-the-arts on Pix3D
dataset with IoU 0.292 and CD 0.108, and reaches IoU 0.329 and CD 0.104 on
Pascal 3D+
Behind Every Domain There is a Shift: Adapting Distortion-aware Vision Transformers for Panoramic Semantic Segmentation
In this paper, we address panoramic semantic segmentation which is
under-explored due to two critical challenges: (1) image distortions and object
deformations on panoramas; (2) lack of semantic annotations in the 360-degree
imagery. To tackle these problems, first, we propose the upgraded Transformer
for Panoramic Semantic Segmentation, i.e., Trans4PASS+, equipped with
Deformable Patch Embedding (DPE) and Deformable MLP (DMLPv2) modules for
handling object deformations and image distortions whenever (before or after
adaptation) and wherever (shallow or deep levels). Second, we enhance the
Mutual Prototypical Adaptation (MPA) strategy via pseudo-label rectification
for unsupervised domain adaptive panoramic segmentation. Third, aside from
Pinhole-to-Panoramic (Pin2Pan) adaptation, we create a new dataset (SynPASS)
with 9,080 panoramic images, facilitating Synthetic-to-Real (Syn2Real)
adaptation scheme in 360-degree imagery. Extensive experiments are conducted,
which cover indoor and outdoor scenarios, and each of them is investigated with
Pin2Pan and Syn2Real regimens. Trans4PASS+ achieves state-of-the-art
performances on four domain adaptive panoramic semantic segmentation
benchmarks. Code is available at https://github.com/jamycheung/Trans4PASS.Comment: Extended version of CVPR 2022 paper arXiv:2203.01452. Code is
available at https://github.com/jamycheung/Trans4PAS
Behind every domain there is a shift: adapting distortion-aware vision transformers for panoramic semantic segmentation
In this paper, we address panoramic semantic segmentation which is under-explored due to two critical challenges: (1) image
distortions and object deformations on panoramas; (2) lack of semantic annotations in the 360◦ imagery. To tackle these problems, first,
we propose the upgraded Transformer for Panoramic Semantic Segmentation, i.e., Trans4PASS+, equipped with Deformable Patch
Embedding (DPE) and Deformable MLP (DMLPv2) modules for handling object deformations and image distortions whenever (before
or after adaptation) and wherever (shallow or deep levels). Second, we enhance the Mutual Prototypical Adaptation (MPA) strategy
via pseudo-label rectification for unsupervised domain adaptive panoramic segmentation. Third, aside from Pinhole-to-Panoramic
(PIN2PAN) adaptation, we create a new dataset (SynPASS) with 9,080 panoramic images, facilitating Synthetic-to-Real (SYN2REAL)
adaptation scheme in 360◦ imagery. Extensive experiments are conducted, which cover indoor and outdoor scenarios, and each of
them is investigated with PIN2PAN and SYN2REAL regimens. Trans4PASS+ achieves state-of-the-art performances on four domain
adaptive panoramic semantic segmentation benchmarks. Code is available at https://github.com/jamycheung/Trans4PASS
Generative Adversarial Networks (GANs): Challenges, Solutions, and Future Directions
Generative Adversarial Networks (GANs) is a novel class of deep generative
models which has recently gained significant attention. GANs learns complex and
high-dimensional distributions implicitly over images, audio, and data.
However, there exists major challenges in training of GANs, i.e., mode
collapse, non-convergence and instability, due to inappropriate design of
network architecture, use of objective function and selection of optimization
algorithm. Recently, to address these challenges, several solutions for better
design and optimization of GANs have been investigated based on techniques of
re-engineered network architectures, new objective functions and alternative
optimization algorithms. To the best of our knowledge, there is no existing
survey that has particularly focused on broad and systematic developments of
these solutions. In this study, we perform a comprehensive survey of the
advancements in GANs design and optimization solutions proposed to handle GANs
challenges. We first identify key research issues within each design and
optimization technique and then propose a new taxonomy to structure solutions
by key research issues. In accordance with the taxonomy, we provide a detailed
discussion on different GANs variants proposed within each solution and their
relationships. Finally, based on the insights gained, we present the promising
research directions in this rapidly growing field.Comment: 42 pages, Figure 13, Table
Behind every domain there is a shift: adapting distortion-aware vision transformers for panoramic semantic segmentation
In this paper, we address panoramic semantic segmentation which is under-explored due to two critical challenges: (1) image distortions and object deformations on panoramas; (2) lack of semantic annotations in the 360∘ imagery. To tackle these problems, first, we propose the upgraded Transformer for Panoramic Semantic Segmentation, ie, Trans4PASS+, equipped with Deformable Patch Embedding (DPE) and Deformable MLP (DMLPv2) modules for handling object deformations and image distortions whenever (before or after adaptation) and wherever (shallow or deep levels). Second, we enhance the Mutual Prototypical Adaptation (MPA) strategy via pseudo-label rectification for unsupervised domain adaptive panoramic segmentation. Third, aside from Pinhole-to-Panoramic ( Pin2Pan ) adaptation, we create a new dataset (SynPASS) with 9,080 panoramic images, facilitating Synthetic-to-Real ( Syn2Real ) adaptation scheme in 360∘ imagery. Extensive experiments are conducted, which cover indoor and outdoor scenarios, and each of them is investigated with Pin2Pan and Syn2Real regimens. Trans4PASS+ achieves state-of-the-art performances on four domain adaptive panoramic semantic segmentation benchmarks. Code is available at https://github.com/jamycheung/Trans4PASS
Representation Separation for Semantic Segmentation with Vision Transformers
Vision transformers (ViTs) encoding an image as a sequence of patches bring
new paradigms for semantic segmentation.We present an efficient framework of
representation separation in local-patch level and global-region level for
semantic segmentation with ViTs. It is targeted for the peculiar
over-smoothness of ViTs in semantic segmentation, and therefore differs from
current popular paradigms of context modeling and most existing related methods
reinforcing the advantage of attention. We first deliver the decoupled
two-pathway network in which another pathway enhances and passes down
local-patch discrepancy complementary to global representations of
transformers. We then propose the spatially adaptive separation module to
obtain more separate deep representations and the discriminative
cross-attention which yields more discriminative region representations through
novel auxiliary supervisions. The proposed methods achieve some impressive
results: 1) incorporated with large-scale plain ViTs, our methods achieve new
state-of-the-art performances on five widely used benchmarks; 2) using masked
pre-trained plain ViTs, we achieve 68.9% mIoU on Pascal Context, setting a new
record; 3) pyramid ViTs integrated with the decoupled two-pathway network even
surpass the well-designed high-resolution ViTs on Cityscapes; 4) the improved
representations by our framework have favorable transferability in images with
natural corruptions. The codes will be released publicly.Comment: 17 pages, 13 figures. This work has been submitted to the IEEE for
possible publication. Copyright may be transferred without notice, after
which this version may no longer be accessibl
- …