Estimating the transferability of publicly available pretrained models to a
target task has assumed an important place for transfer learning tasks in
recent years. Existing efforts propose metrics that allow a user to choose one
model from a pool of pre-trained models without having to fine-tune each model
individually and identify one explicitly. With the growth in the number of
available pre-trained models and the popularity of model ensembles, it also
becomes essential to study the transferability of multiple-source models for a
given target task. The few existing efforts study transferability in such
multi-source ensemble settings using just the outputs of the classification
layer and neglect possible domain or task mismatch. Moreover, they overlook the
most important factor while selecting the source models, viz., the cohesiveness
factor between them, which can impact the performance and confidence in the
prediction of the ensemble. To address these gaps, we propose a novel Optimal
tranSport-based suBmOdular tRaNsferability metric (OSBORN) to estimate the
transferability of an ensemble of models to a downstream task. OSBORN
collectively accounts for image domain difference, task difference, and
cohesiveness of models in the ensemble to provide reliable estimates of
transferability. We gauge the performance of OSBORN on both image
classification and semantic segmentation tasks. Our setup includes 28 source
datasets, 11 target datasets, 5 model architectures, and 2 pre-training
methods. We benchmark our method against current state-of-the-art metrics
MS-LEEP and E-LEEP, and outperform them consistently using the proposed
approach.Comment: To appear at ICCV 202