3 research outputs found
An Exploration into the Benefits of the CLIP model for Lifelog Retrieval
In this paper, we attempt to fine-tune the CLIP (Contrastive Language-Image Pre-Training) model on the Lifelog Question Answering dataset (LLQA) to investigate retrieval performance of the fine-tuned model over the zero-shot baseline model. We train the model adopting a weight space ensembling approach using a modified loss function to take into account the differences in our dataset (LLQA) when compared with the dataset the CLIP model was originally pretrained on. We further evaluate our fine-tuned model using visual as well as multimodal queries on multiple retrieval tasks, demonstrating improved performance over the zero-shot baseline model
On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation
Constructing a robust model that can effectively generalize to test samples
under distribution shifts remains a significant challenge in the field of
medical imaging. The foundational models for vision and language, pre-trained
on extensive sets of natural image and text data, have emerged as a promising
approach. It showcases impressive learning abilities across different tasks
with the need for only a limited amount of annotated samples. While numerous
techniques have focused on developing better fine-tuning strategies to adapt
these models for specific domains, we instead examine their robustness to
domain shifts in the medical image segmentation task. To this end, we compare
the generalization performance to unseen domains of various pre-trained models
after being fine-tuned on the same in-distribution dataset and show that
foundation-based models enjoy better robustness than other architectures. From
here, we further developed a new Bayesian uncertainty estimation for frozen
models and used them as an indicator to characterize the model's performance on
out-of-distribution (OOD) data, proving particularly beneficial for real-world
applications. Our experiments not only reveal the limitations of current
indicators like accuracy on the line or agreement on the line commonly used in
natural image applications but also emphasize the promise of the introduced
Bayesian uncertainty. Specifically, lower uncertainty predictions usually tend
to higher out-of-distribution (OOD) performance.Comment: Advances in Neural Information Processing Systems (NeurIPS) 2023,
Workshop on robustness of zero/few-shot learning in foundation model
TextANIMAR: Text-based 3D Animal Fine-Grained Retrieval
3D object retrieval is an important yet challenging task, which has drawn
more and more attention in recent years. While existing approaches have made
strides in addressing this issue, they are often limited to restricted settings
such as image and sketch queries, which are often unfriendly interactions for
common users. In order to overcome these limitations, this paper presents a
novel SHREC challenge track focusing on text-based fine-grained retrieval of 3D
animal models. Unlike previous SHREC challenge tracks, the proposed task is
considerably more challenging, requiring participants to develop innovative
approaches to tackle the problem of text-based retrieval. Despite the increased
difficulty, we believe that this task has the potential to drive useful
applications in practice and facilitate more intuitive interactions with 3D
objects. Five groups participated in our competition, submitting a total of 114
runs. While the results obtained in our competition are satisfactory, we note
that the challenges presented by this task are far from being fully solved. As
such, we provide insights into potential areas for future research and
improvements. We believe that we can help push the boundaries of 3D object
retrieval and facilitate more user-friendly interactions via vision-language
technologies.Comment: arXiv admin note: text overlap with arXiv:2304.0573