8 research outputs found
Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How
With the ever-increasing number of pretrained models, machine learning
practitioners are continuously faced with which pretrained model to use, and
how to finetune it for a new dataset. In this paper, we propose a methodology
that jointly searches for the optimal pretrained model and the hyperparameters
for finetuning it. Our method transfers knowledge about the performance of many
pretrained models with multiple hyperparameter configurations on a series of
datasets. To this aim, we evaluated over 20k hyperparameter configurations for
finetuning 24 pretrained image classification models on 87 datasets to generate
a large-scale meta-dataset. We meta-learn a multi-fidelity performance
predictor on the learning curves of this meta-dataset and use it for fast
hyperparameter optimization on new datasets. We empirically demonstrate that
our resulting approach can quickly select an accurate pretrained model for a
new dataset together with its optimal hyperparameters
How stable are Transferability Metrics evaluations?
Transferability metrics is a maturing field with increasing interest, which
aims at providing heuristics for selecting the most suitable source models to
transfer to a given target dataset, without fine-tuning them all. However,
existing works rely on custom experimental setups which differ across papers,
leading to inconsistent conclusions about which transferability metrics work
best. In this paper we conduct a large-scale study by systematically
constructing a broad range of 715k experimental setup variations. We discover
that even small variations to an experimental setup lead to different
conclusions about the superiority of a transferability metric over another.
Then we propose better evaluations by aggregating across many experiments,
enabling to reach more stable conclusions. As a result, we reveal the
superiority of LogME at selecting good source datasets to transfer from in a
semantic segmentation scenario, NLEEP at selecting good source architectures in
an image classification scenario, and GBC at determining which target task
benefits most from a given source model. Yet, no single transferability metric
works best in all scenarios
Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models
Transfer learning aims to leverage knowledge from pre-trained models to
benefit the target task. Prior transfer learning work mainly transfers from a
single model. However, with the emergence of deep models pre-trained from
different resources, model hubs consisting of diverse models with various
architectures, pre-trained datasets and learning paradigms are available.
Directly applying single-model transfer learning methods to each model wastes
the abundant knowledge of the model hub and suffers from high computational
cost. In this paper, we propose a Hub-Pathway framework to enable knowledge
transfer from a model hub. The framework generates data-dependent pathway
weights, based on which we assign the pathway routes at the input level to
decide which pre-trained models are activated and passed through, and then set
the pathway aggregation at the output level to aggregate the knowledge from
different models to make predictions. The proposed framework can be trained
end-to-end with the target task-specific loss, where it learns to explore
better pathway configurations and exploit the knowledge in pre-trained models
for each target datum. We utilize a noisy pathway generator and design an
exploration loss to further explore different pathways throughout the model
hub. To fully exploit the knowledge in pre-trained models, each model is
further trained by specific data that activate it, which ensures its
performance and enhances knowledge transfer. Experiment results on computer
vision and reinforcement learning tasks demonstrate that the proposed
Hub-Pathway framework achieves the state-of-the-art performance for model hub
transfer learning.Comment: Accepted by NeurIPS 202
Frustratingly Easy Transferability Estimation
Transferability estimation has been an essential tool in selecting a
pre-trained model and the layers of it to transfer, so as to maximize the
performance on a target task and prevent negative transfer. Existing estimation
algorithms either require intensive training on target tasks or have
difficulties in evaluating the transferability between layers. We propose a
simple, efficient, and effective transferability measure named TransRate. With
single pass through the target data, TransRate measures the transferability as
the mutual information between the features of target examples extracted by a
pre-trained model and labels of them. We overcome the challenge of efficient
mutual information estimation by resorting to coding rate that serves as an
effective alternative to entropy. TransRate is theoretically analyzed to be
closely related to the performance after transfer learning. Despite its
extraordinary simplicity in 10 lines of codes, TransRate performs remarkably
well in extensive evaluations on 22 pre-trained models and 16 downstream tasks
A Survey on Negative Transfer
Transfer learning (TL) tries to utilize data or knowledge from one or more
source domains to facilitate the learning in a target domain. It is
particularly useful when the target domain has few or no labeled data, due to
annotation expense, privacy concerns, etc. Unfortunately, the effectiveness of
TL is not always guaranteed. Negative transfer (NT), i.e., the source domain
data/knowledge cause reduced learning performance in the target domain, has
been a long-standing and challenging problem in TL. Various approaches to
handle NT have been proposed in the literature. However, this filed lacks a
systematic survey on the formalization of NT, their factors and the algorithms
that handle NT. This paper proposes to fill this gap. First, the definition of
negative transfer is considered and a taxonomy of the factors are discussed.
Then, near fifty representative approaches for handling NT are categorized and
reviewed, from four perspectives: secure transfer, domain similarity
estimation, distant transfer and negative transfer mitigation. NT in related
fields, e.g., multi-task learning, lifelong learning, and adversarial attacks
are also discussed
Learning universal representations across tasks and domains
A longstanding goal in computer vision research is to produce broad and general-purpose systems that work well on a broad range of vision problems and are capable of learning concepts only from few labelled samples. In contrast, existing models are limited to work only in specific tasks or domains (datasets), e.g., a semantic segmentation model for indoor images (Silberman et al., 2012). In addition, they are data inefficient and require large labelled dataset for each task or domain. While there has been works proposed for domain/task-agnostic representations by either loss balancing strategies or architecture design, it remains a challenging problem on optimizing such universal representation network. This thesis focuses on addressing the challenges of learning universal representations that generalize well over multiple tasks (e.g. segmentation, depth estimation) or various visual domains (e.g. image object classification, image action classification). In addition, the thesis also shows that these representations can be learned from partial supervision and transferred and adopted to previously unseen tasks/domains in a data-efficient manner.
The first part of the dissertation focuses on learning universal representations, i.e. a single universal network for multi-task learning (e.g., learning a single network jointly for different dense prediction tasks like segmentation and depth estimation) and multi- domain learning (e.g. image classification for various vision datasets, each collected for a different problem like texture, flower or action classification). Learning such universal representations by jointly minimizing the sum of all task-specific losses is challenging because of the interference between tasks and it leads to unbalanced results (i.e. some tasks dominate or interfere other tasks and the universal network performs worse than task/domain-specific networks each of which is trained for a task/domain independently). Hence a new solution is proposed to regularize the optimization of the universal network by encouraging the universal network to produce the same features as the ones of task-specific networks. The experimental results demonstrate that the proposed method learns a single universal network that performs well for multiple tasks or various visual domains.
Despite the recent advances in multi-task learning of dense prediction problems, most methods rely on expensive labelled datasets. Relaxing this assumption gives rise to a new multi-task learning setting, called multi-task partially-supervised learning in this thesis, in which the goal is to jointly learn of multiple dense prediction tasks on partially annotated data (i.e. not all the task labels are available for each training image). In the thesis, a label efficient approach is proposed to successfully leverage task relations to supervise its multi-task learning when data is partially annotated. In particular, the proposed method learns to map each task pair to a joint pairwise task- space which enables sharing information between them in a computationally efficient way through another network conditioned on task pairs, and avoids learning trivial cross-task relations by retaining high-level information about the input image.
The final part of the dissertation studies the problem of adapting a model to pre- viously unseen tasks (from seen or unseen domains) with very few labelled training samples of the new tasks, i.e. cross-domain few-shot learning. Recent methods have focused on using various adaptation strategies for aligning their visual representations to new domains or selecting the relevant ones from multiple domain-specific feature extractors. In this dissertation, new methods are formulated to learn a single task- agnostic network from multiple domains during meta-training and attach light-weight task-specific parameters that are learned from limited training samples and adapt the task-agnostic network to accommodate the previously unseen tasks. Systematic analysis is performed to study various task adaptation strategies for few-shot learning. Extensive experimental evidence demonstrates that the proposed methods that learn a single set of task-agnostic representations and adapt the representations via residual adapters in matrix form attached to the task-agnostic model significantly benefits the cross-domain few-shot learning