In this work, we introduce a ``score-based assessment'' framework for
estimating the transferability of pre-trained speech models (PSMs) for
fine-tuning target tasks. We leverage upon two representation theories,
Bayesian likelihood estimation and optimal transport, to generate rank scores
for the PSM candidates using the extracted representations. Our framework
efficiently computes transferability scores without actual fine-tuning of
candidate models or layers by making a temporal independent hypothesis. We
evaluate some popular supervised speech models (e.g., Conformer RNN-Transducer)
and self-supervised speech models (e.g., HuBERT) in cross-layer and cross-model
settings using public data. Experimental results show a high Spearman's rank
correlation and low p-value between our estimation framework and fine-tuning
ground truth. Our proposed transferability framework requires less
computational time and resources, making it a resource-saving and
time-efficient approach for tuning speech foundation models.Comment: Accepted to Interspeech. Code will be release