19 research outputs found
OVERCOMPLETE DEEP SUBSPACE CLUSTERING
Deep Subspace Clustering Networks (DSC) provide an efficient solution to the problem of unsupervised subspace clustering by using an undercomplete deep auto-encoder with a fully-connected layer to exploit the self expressiveness property. This method uses undercomplete representations of the input data which makes it not so robust and more dependent on pre-training. To overcome this, we propose a simple yet efficient alternative method - Overcomplete Deep Subspace Clustering Networks (ODSC) where we use overcomplete representations for subspace clustering. In our proposed method, we fuse the features from both undercomplete and overcomplete auto-encoder networks before passing them through the self-expressive layer thus enabling us to extract a more meaningful and robust representation of the input data for clustering. Experimental results on four benchmark datasets show the effectiveness of the proposed method over DSC and other clustering methods in terms of clustering error. Our method is also not as dependent as DSC is on where pre-training should be stopped to get the best performance and is also more robust to noise
CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition
Vision-Language models like CLIP have been widely adopted for various tasks
due to their impressive zero-shot capabilities. However, CLIP is not suitable
for extracting 3D geometric features as it was trained on only images and text
by natural language supervision. We work on addressing this limitation and
propose a new framework termed CG3D (CLIP Goes 3D) where a 3D encoder is
learned to exhibit zero-shot capabilities. CG3D is trained using triplets of
pointclouds, corresponding rendered 2D images, and texts using natural language
supervision. To align the features in a multimodal embedding space, we utilize
contrastive loss on 3D features obtained from the 3D encoder, as well as visual
and text features extracted from CLIP. We note that the natural images used to
train CLIP and the rendered 2D images in CG3D have a distribution shift.
Attempting to train the visual and text encoder to account for this shift
results in catastrophic forgetting and a notable decrease in performance. To
solve this, we employ prompt tuning and introduce trainable parameters in the
input space to shift CLIP towards the 3D pre-training dataset utilized in CG3D.
We extensively test our pre-trained CG3D framework and demonstrate its
impressive capabilities in zero-shot, open scene understanding, and retrieval
tasks. Further, it also serves as strong starting weights for fine-tuning in
downstream 3D recognition tasks.Comment: Website: https://jeya-maria-jose.github.io/cg3d-web
TransWeather: Transformer-based Restoration of Images Degraded by Adverse Weather Conditions
Removing adverse weather conditions like rain, fog, and snow from images is
an important problem in many applications. Most methods proposed in the
literature have been designed to deal with just removing one type of
degradation. Recently, a CNN-based method using neural architecture search
(All-in-One) was proposed to remove all the weather conditions at once.
However, it has a large number of parameters as it uses multiple encoders to
cater to each weather removal task and still has scope for improvement in its
performance. In this work, we focus on developing an efficient solution for the
all adverse weather removal problem. To this end, we propose TransWeather, a
transformer-based end-to-end model with just a single encoder and a decoder
that can restore an image degraded by any weather condition. Specifically, we
utilize a novel transformer encoder using intra-patch transformer blocks to
enhance attention inside the patches to effectively remove smaller weather
degradations. We also introduce a transformer decoder with learnable weather
type embeddings to adjust to the weather degradation at hand. TransWeather
achieves improvements across multiple test datasets over both All-in-One
network as well as methods fine-tuned for specific tasks. TransWeather is also
validated on real world test images and found to be more effective than
previous methods. Implementation code can be accessed at
https://github.com/jeya-maria-jose/TransWeather .Comment: CVPR 202
Ambiguous Medical Image Segmentation using Diffusion Models
Collective insights from a group of experts have always proven to outperform
an individual's best diagnostic for clinical tasks. For the task of medical
image segmentation, existing research on AI-based alternatives focuses more on
developing models that can imitate the best individual rather than harnessing
the power of expert groups. In this paper, we introduce a single diffusion
model-based approach that produces multiple plausible outputs by learning a
distribution over group insights. Our proposed model generates a distribution
of segmentation masks by leveraging the inherent stochastic sampling process of
diffusion using only minimal additional learning. We demonstrate on three
different medical image modalities- CT, ultrasound, and MRI that our model is
capable of producing several possible variants while capturing the frequencies
of their occurrences. Comprehensive results show that our proposed approach
outperforms existing state-of-the-art ambiguous segmentation networks in terms
of accuracy while preserving naturally occurring variation. We also propose a
new metric to evaluate the diversity as well as the accuracy of segmentation
predictions that aligns with the interest of clinical practice of collective
insights