Despite the rapid progress in self-supervised learning (SSL), end-to-end
fine-tuning still remains the dominant fine-tuning strategy for medical imaging
analysis. However, it remains unclear whether this approach is truly optimal
for effectively utilizing the pre-trained knowledge, especially considering the
diverse categories of SSL that capture different types of features. In this
paper, we first establish strong contrastive and restorative SSL baselines that
outperform SOTA methods across four diverse downstream tasks. Building upon
these strong baselines, we conduct an extensive fine-tuning analysis across
multiple pre-training and fine-tuning datasets, as well as various fine-tuning
dataset sizes. Contrary to the conventional wisdom of fine-tuning only the last
few layers of a pre-trained network, we show that fine-tuning intermediate
layers is more effective, with fine-tuning the second quarter (25-50%) of the
network being optimal for contrastive SSL whereas fine-tuning the third quarter
(50-75%) of the network being optimal for restorative SSL. Compared to the
de-facto standard of end-to-end fine-tuning, our best fine-tuning strategy,
which fine-tunes a shallower network consisting of the first three quarters
(0-75%) of the pre-trained network, yields improvements of as much as 5.48%.
Additionally, using these insights, we propose a simple yet effective method to
leverage the complementary strengths of multiple SSL models, resulting in
enhancements of up to 3.57% compared to using the best model alone. Hence, our
fine-tuning strategies not only enhance the performance of individual SSL
models, but also enable effective utilization of the complementary strengths
offered by multiple SSL models, leading to significant improvements in
self-supervised medical imaging analysis