21 research outputs found
Classification of Colorectal Cancer Polyps via Transfer Learning and Vision-Based Tactile Sensing
In this study, to address the current high earlydetection miss rate of
colorectal cancer (CRC) polyps, we explore the potentials of utilizing transfer
learning and machine learning (ML) classifiers to precisely and sensitively
classify the type of CRC polyps. Instead of using the common colonoscopic
images, we applied three different ML algorithms on the 3D textural image
outputs of a unique vision-based surface tactile sensor (VS-TS). To collect
realistic textural images of CRC polyps for training the utilized ML
classifiers and evaluating their performance, we first designed and additively
manufactured 48 types of realistic polyp phantoms with different hardness,
type, and textures. Next, the performance of the used three ML algorithms in
classifying the type of fabricated polyps was quantitatively evaluated using
various statistical metrics.Comment: Accepted to IEEE Sensors 2022 Conferenc
The Performance of Transferability Metrics does not Translate to Medical Tasks
Transfer learning boosts the performance of medical image analysis by
enabling deep learning (DL) on small datasets through the knowledge acquired
from large ones. As the number of DL architectures explodes, exhaustively
attempting all candidates becomes unfeasible, motivating cheaper alternatives
for choosing them. Transferability scoring methods emerge as an enticing
solution, allowing to efficiently calculate a score that correlates with the
architecture accuracy on any target dataset. However, since transferability
scores have not been evaluated on medical datasets, their use in this context
remains uncertain, preventing them from benefiting practitioners. We fill that
gap in this work, thoroughly evaluating seven transferability scores in three
medical applications, including out-of-distribution scenarios. Despite
promising results in general-purpose datasets, our results show that no
transferability score can reliably and consistently estimate target performance
in medical contexts, inviting further work in that direction.Comment: 10 pages, 3 figures. Accepted at the DART workshop @ MICCAI 202
Tangent Transformers for Composition, Privacy and Removal
We introduce Tangent Attention Fine-Tuning (TAFT), a method for fine-tuning
linearized transformers obtained by computing a First-order Taylor Expansion
around a pre-trained initialization. We show that the Jacobian-Vector Product
resulting from linearization can be computed efficiently in a single forward
pass, reducing training and inference cost to the same order of magnitude as
its original non-linear counterpart, while using the same number of parameters.
Furthermore, we show that, when applied to various downstream visual
classification tasks, the resulting Tangent Transformer fine-tuned with TAFT
can perform comparably with fine-tuning the original non-linear network. Since
Tangent Transformers are linear with respect to the new set of weights, and the
resulting fine-tuning loss is convex, we show that TAFT enjoys several
advantages compared to non-linear fine-tuning when it comes to model
composition, parallel training, machine unlearning, and differential privacy
What Matters For Meta-Learning Vision Regression Tasks?
Meta-learning is widely used in few-shot classification and function regression due to its ability to quickly adapt to unseen tasks. However, it has not yet been well explored on regression tasks with high dimensional inputs such as images. This paper makes two main contributions that help understand this barely explored area. First, we design two new types of cross-category level vision regression tasks, namely object discovery and pose estimation of unprecedented complexity in the meta-learning domain for computer vision. To this end, we (i) exhaustively evaluate common meta-learning techniques on these tasks, and (ii) quantitatively analyze the effect of various deep learning techniques commonly used in recent meta-learning algorithms in order to strengthen the generalization capability: data augmentation, domain randomization, task augmentation and meta-regularization. Finally, we (iii) provide some insights and practical recommendations for training meta-learning algorithms on vision regression tasks. Second, we propose the addition of functional contrastive learning (FCL) over the task representations in Conditional Neural Processes (CNPs) and train in an end-to-end fashion. The experimental results show that the results of prior work are misleading as a consequence of a poor choice of the loss function as well as too small meta-training sets. Specifically, we find that CNPs outperform MAML on most tasks without fine-tuning. Furthermore, we observe that naive task augmentation without a tailored design results in underfitting
Revisiting Fine-Tuning Strategies for Self-supervised Medical Imaging Analysis
Despite the rapid progress in self-supervised learning (SSL), end-to-end
fine-tuning still remains the dominant fine-tuning strategy for medical imaging
analysis. However, it remains unclear whether this approach is truly optimal
for effectively utilizing the pre-trained knowledge, especially considering the
diverse categories of SSL that capture different types of features. In this
paper, we first establish strong contrastive and restorative SSL baselines that
outperform SOTA methods across four diverse downstream tasks. Building upon
these strong baselines, we conduct an extensive fine-tuning analysis across
multiple pre-training and fine-tuning datasets, as well as various fine-tuning
dataset sizes. Contrary to the conventional wisdom of fine-tuning only the last
few layers of a pre-trained network, we show that fine-tuning intermediate
layers is more effective, with fine-tuning the second quarter (25-50%) of the
network being optimal for contrastive SSL whereas fine-tuning the third quarter
(50-75%) of the network being optimal for restorative SSL. Compared to the
de-facto standard of end-to-end fine-tuning, our best fine-tuning strategy,
which fine-tunes a shallower network consisting of the first three quarters
(0-75%) of the pre-trained network, yields improvements of as much as 5.48%.
Additionally, using these insights, we propose a simple yet effective method to
leverage the complementary strengths of multiple SSL models, resulting in
enhancements of up to 3.57% compared to using the best model alone. Hence, our
fine-tuning strategies not only enhance the performance of individual SSL
models, but also enable effective utilization of the complementary strengths
offered by multiple SSL models, leading to significant improvements in
self-supervised medical imaging analysis
Study of augmentations on historical manuscripts using TrOCR
Historical manuscripts are an essential source of original content. For many reasons, it is hard to recognize these manuscripts as text. This thesis used a state-of-the-art Handwritten Text Recognizer, TrOCR, to recognize a 16th-century manuscript. TrOCR uses a vision transformer to encode the input images and a language transformer to decode them back to text. We showed that carefully preprocessed images and designed augmentations can improve the performance of TrOCR. We suggest an ensemble of augmented models to achieve an even better performance