8 research outputs found

    Quick-Tune: Quickly Learning Which Pretrained Model to Finetune and How

    Full text link
    With the ever-increasing number of pretrained models, machine learning practitioners are continuously faced with which pretrained model to use, and how to finetune it for a new dataset. In this paper, we propose a methodology that jointly searches for the optimal pretrained model and the hyperparameters for finetuning it. Our method transfers knowledge about the performance of many pretrained models with multiple hyperparameter configurations on a series of datasets. To this aim, we evaluated over 20k hyperparameter configurations for finetuning 24 pretrained image classification models on 87 datasets to generate a large-scale meta-dataset. We meta-learn a multi-fidelity performance predictor on the learning curves of this meta-dataset and use it for fast hyperparameter optimization on new datasets. We empirically demonstrate that our resulting approach can quickly select an accurate pretrained model for a new dataset together with its optimal hyperparameters

    How stable are Transferability Metrics evaluations?

    Full text link
    Transferability metrics is a maturing field with increasing interest, which aims at providing heuristics for selecting the most suitable source models to transfer to a given target dataset, without fine-tuning them all. However, existing works rely on custom experimental setups which differ across papers, leading to inconsistent conclusions about which transferability metrics work best. In this paper we conduct a large-scale study by systematically constructing a broad range of 715k experimental setup variations. We discover that even small variations to an experimental setup lead to different conclusions about the superiority of a transferability metric over another. Then we propose better evaluations by aggregating across many experiments, enabling to reach more stable conclusions. As a result, we reveal the superiority of LogME at selecting good source datasets to transfer from in a semantic segmentation scenario, NLEEP at selecting good source architectures in an image classification scenario, and GBC at determining which target task benefits most from a given source model. Yet, no single transferability metric works best in all scenarios

    Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models

    Full text link
    Transfer learning aims to leverage knowledge from pre-trained models to benefit the target task. Prior transfer learning work mainly transfers from a single model. However, with the emergence of deep models pre-trained from different resources, model hubs consisting of diverse models with various architectures, pre-trained datasets and learning paradigms are available. Directly applying single-model transfer learning methods to each model wastes the abundant knowledge of the model hub and suffers from high computational cost. In this paper, we propose a Hub-Pathway framework to enable knowledge transfer from a model hub. The framework generates data-dependent pathway weights, based on which we assign the pathway routes at the input level to decide which pre-trained models are activated and passed through, and then set the pathway aggregation at the output level to aggregate the knowledge from different models to make predictions. The proposed framework can be trained end-to-end with the target task-specific loss, where it learns to explore better pathway configurations and exploit the knowledge in pre-trained models for each target datum. We utilize a noisy pathway generator and design an exploration loss to further explore different pathways throughout the model hub. To fully exploit the knowledge in pre-trained models, each model is further trained by specific data that activate it, which ensures its performance and enhances knowledge transfer. Experiment results on computer vision and reinforcement learning tasks demonstrate that the proposed Hub-Pathway framework achieves the state-of-the-art performance for model hub transfer learning.Comment: Accepted by NeurIPS 202

    Frustratingly Easy Transferability Estimation

    Full text link
    Transferability estimation has been an essential tool in selecting a pre-trained model and the layers of it to transfer, so as to maximize the performance on a target task and prevent negative transfer. Existing estimation algorithms either require intensive training on target tasks or have difficulties in evaluating the transferability between layers. We propose a simple, efficient, and effective transferability measure named TransRate. With single pass through the target data, TransRate measures the transferability as the mutual information between the features of target examples extracted by a pre-trained model and labels of them. We overcome the challenge of efficient mutual information estimation by resorting to coding rate that serves as an effective alternative to entropy. TransRate is theoretically analyzed to be closely related to the performance after transfer learning. Despite its extraordinary simplicity in 10 lines of codes, TransRate performs remarkably well in extensive evaluations on 22 pre-trained models and 16 downstream tasks

    A Survey on Negative Transfer

    Full text link
    Transfer learning (TL) tries to utilize data or knowledge from one or more source domains to facilitate the learning in a target domain. It is particularly useful when the target domain has few or no labeled data, due to annotation expense, privacy concerns, etc. Unfortunately, the effectiveness of TL is not always guaranteed. Negative transfer (NT), i.e., the source domain data/knowledge cause reduced learning performance in the target domain, has been a long-standing and challenging problem in TL. Various approaches to handle NT have been proposed in the literature. However, this filed lacks a systematic survey on the formalization of NT, their factors and the algorithms that handle NT. This paper proposes to fill this gap. First, the definition of negative transfer is considered and a taxonomy of the factors are discussed. Then, near fifty representative approaches for handling NT are categorized and reviewed, from four perspectives: secure transfer, domain similarity estimation, distant transfer and negative transfer mitigation. NT in related fields, e.g., multi-task learning, lifelong learning, and adversarial attacks are also discussed

    Learning universal representations across tasks and domains

    Get PDF
    A longstanding goal in computer vision research is to produce broad and general-purpose systems that work well on a broad range of vision problems and are capable of learning concepts only from few labelled samples. In contrast, existing models are limited to work only in specific tasks or domains (datasets), e.g., a semantic segmentation model for indoor images (Silberman et al., 2012). In addition, they are data inefficient and require large labelled dataset for each task or domain. While there has been works proposed for domain/task-agnostic representations by either loss balancing strategies or architecture design, it remains a challenging problem on optimizing such universal representation network. This thesis focuses on addressing the challenges of learning universal representations that generalize well over multiple tasks (e.g. segmentation, depth estimation) or various visual domains (e.g. image object classification, image action classification). In addition, the thesis also shows that these representations can be learned from partial supervision and transferred and adopted to previously unseen tasks/domains in a data-efficient manner. The first part of the dissertation focuses on learning universal representations, i.e. a single universal network for multi-task learning (e.g., learning a single network jointly for different dense prediction tasks like segmentation and depth estimation) and multi- domain learning (e.g. image classification for various vision datasets, each collected for a different problem like texture, flower or action classification). Learning such universal representations by jointly minimizing the sum of all task-specific losses is challenging because of the interference between tasks and it leads to unbalanced results (i.e. some tasks dominate or interfere other tasks and the universal network performs worse than task/domain-specific networks each of which is trained for a task/domain independently). Hence a new solution is proposed to regularize the optimization of the universal network by encouraging the universal network to produce the same features as the ones of task-specific networks. The experimental results demonstrate that the proposed method learns a single universal network that performs well for multiple tasks or various visual domains. Despite the recent advances in multi-task learning of dense prediction problems, most methods rely on expensive labelled datasets. Relaxing this assumption gives rise to a new multi-task learning setting, called multi-task partially-supervised learning in this thesis, in which the goal is to jointly learn of multiple dense prediction tasks on partially annotated data (i.e. not all the task labels are available for each training image). In the thesis, a label efficient approach is proposed to successfully leverage task relations to supervise its multi-task learning when data is partially annotated. In particular, the proposed method learns to map each task pair to a joint pairwise task- space which enables sharing information between them in a computationally efficient way through another network conditioned on task pairs, and avoids learning trivial cross-task relations by retaining high-level information about the input image. The final part of the dissertation studies the problem of adapting a model to pre- viously unseen tasks (from seen or unseen domains) with very few labelled training samples of the new tasks, i.e. cross-domain few-shot learning. Recent methods have focused on using various adaptation strategies for aligning their visual representations to new domains or selecting the relevant ones from multiple domain-specific feature extractors. In this dissertation, new methods are formulated to learn a single task- agnostic network from multiple domains during meta-training and attach light-weight task-specific parameters that are learned from limited training samples and adapt the task-agnostic network to accommodate the previously unseen tasks. Systematic analysis is performed to study various task adaptation strategies for few-shot learning. Extensive experimental evidence demonstrates that the proposed methods that learn a single set of task-agnostic representations and adapt the representations via residual adapters in matrix form attached to the task-agnostic model significantly benefits the cross-domain few-shot learning
    corecore