45 research outputs found
An ε -Uniform Numerical Method for a System of Convection-Diffusion Equations with Discontinuous Convection Coefficients and Source Terms
In this paper, a parameter-uniform numerical method is suggested to solve a system of singularly perturbed convection-diffusion equations with discontinuous convection coefficients and source terms subject to the Dirichlet boundary condition. The second derivative of each equation is multiplied by a distinctly small parameter, which leads to an overlap and interacting interior layer. A numerical method based on a piecewise uniform Shishkin mesh is constructed. Numerical results are presented to support the theoretical results
CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition
Vision-Language models like CLIP have been widely adopted for various tasks
due to their impressive zero-shot capabilities. However, CLIP is not suitable
for extracting 3D geometric features as it was trained on only images and text
by natural language supervision. We work on addressing this limitation and
propose a new framework termed CG3D (CLIP Goes 3D) where a 3D encoder is
learned to exhibit zero-shot capabilities. CG3D is trained using triplets of
pointclouds, corresponding rendered 2D images, and texts using natural language
supervision. To align the features in a multimodal embedding space, we utilize
contrastive loss on 3D features obtained from the 3D encoder, as well as visual
and text features extracted from CLIP. We note that the natural images used to
train CLIP and the rendered 2D images in CG3D have a distribution shift.
Attempting to train the visual and text encoder to account for this shift
results in catastrophic forgetting and a notable decrease in performance. To
solve this, we employ prompt tuning and introduce trainable parameters in the
input space to shift CLIP towards the 3D pre-training dataset utilized in CG3D.
We extensively test our pre-trained CG3D framework and demonstrate its
impressive capabilities in zero-shot, open scene understanding, and retrieval
tasks. Further, it also serves as strong starting weights for fine-tuning in
downstream 3D recognition tasks.Comment: Website: https://jeya-maria-jose.github.io/cg3d-web
TransWeather: Transformer-based Restoration of Images Degraded by Adverse Weather Conditions
Removing adverse weather conditions like rain, fog, and snow from images is
an important problem in many applications. Most methods proposed in the
literature have been designed to deal with just removing one type of
degradation. Recently, a CNN-based method using neural architecture search
(All-in-One) was proposed to remove all the weather conditions at once.
However, it has a large number of parameters as it uses multiple encoders to
cater to each weather removal task and still has scope for improvement in its
performance. In this work, we focus on developing an efficient solution for the
all adverse weather removal problem. To this end, we propose TransWeather, a
transformer-based end-to-end model with just a single encoder and a decoder
that can restore an image degraded by any weather condition. Specifically, we
utilize a novel transformer encoder using intra-patch transformer blocks to
enhance attention inside the patches to effectively remove smaller weather
degradations. We also introduce a transformer decoder with learnable weather
type embeddings to adjust to the weather degradation at hand. TransWeather
achieves improvements across multiple test datasets over both All-in-One
network as well as methods fine-tuned for specific tasks. TransWeather is also
validated on real world test images and found to be more effective than
previous methods. Implementation code can be accessed at
https://github.com/jeya-maria-jose/TransWeather .Comment: CVPR 202
Ambiguous Medical Image Segmentation using Diffusion Models
Collective insights from a group of experts have always proven to outperform
an individual's best diagnostic for clinical tasks. For the task of medical
image segmentation, existing research on AI-based alternatives focuses more on
developing models that can imitate the best individual rather than harnessing
the power of expert groups. In this paper, we introduce a single diffusion
model-based approach that produces multiple plausible outputs by learning a
distribution over group insights. Our proposed model generates a distribution
of segmentation masks by leveraging the inherent stochastic sampling process of
diffusion using only minimal additional learning. We demonstrate on three
different medical image modalities- CT, ultrasound, and MRI that our model is
capable of producing several possible variants while capturing the frequencies
of their occurrences. Comprehensive results show that our proposed approach
outperforms existing state-of-the-art ambiguous segmentation networks in terms
of accuracy while preserving naturally occurring variation. We also propose a
new metric to evaluate the diversity as well as the accuracy of segmentation
predictions that aligns with the interest of clinical practice of collective
insights