12,154 research outputs found

    Dataset Distillation with Convexified Implicit Gradients

    Full text link
    We propose a new dataset distillation algorithm using reparameterization and convexification of implicit gradients (RCIG), that substantially improves the state-of-the-art. To this end, we first formulate dataset distillation as a bi-level optimization problem. Then, we show how implicit gradients can be effectively used to compute meta-gradient updates. We further equip the algorithm with a convexified approximation that corresponds to learning on top of a frozen finite-width neural tangent kernel. Finally, we improve bias in implicit gradients by parameterizing the neural network to enable analytical computation of final-layer parameters given the body parameters. RCIG establishes the new state-of-the-art on a diverse series of dataset distillation tasks. Notably, with one image per class, on resized ImageNet, RCIG sees on average a 108% improvement over the previous state-of-the-art distillation algorithm. Similarly, we observed a 66% gain over SOTA on Tiny-ImageNet and 37% on CIFAR-100

    Making Scalable Meta Learning Practical

    Full text link
    Despite its flexibility to learn diverse inductive biases in machine learning programs, meta learning (i.e., learning to learn) has long been recognized to suffer from poor scalability due to its tremendous compute/memory costs, training instability, and a lack of efficient distributed training support. In this work, we focus on making scalable meta learning practical by introducing SAMA, which combines advances in both implicit differentiation algorithms and systems. Specifically, SAMA is designed to flexibly support a broad range of adaptive optimizers in the base level of meta learning programs, while reducing computational burden by avoiding explicit computation of second-order gradient information, and exploiting efficient distributed training techniques implemented for first-order gradients. Evaluated on multiple large-scale meta learning benchmarks, SAMA showcases up to 1.7/4.8x increase in throughput and 2.0/3.8x decrease in memory consumption respectively on single-/multi-GPU setups compared to other baseline meta learning algorithms. Furthermore, we show that SAMA-based data optimization leads to consistent improvements in text classification accuracy with BERT and RoBERTa large language models, and achieves state-of-the-art results in both small- and large-scale data pruning on image classification tasks, demonstrating the practical applicability of scalable meta learning across language and vision domains

    Meta-learning with implicit gradients in a few-shot setting for medical image segmentation

    Get PDF
    Widely used traditional supervised deep learning methods require a large number of training samples but often fail to generalize on unseen datasets. Therefore, a more general application of any trained model is quite limited for medical imaging for clinical practice. Using separately trained models for each unique lesion category or a unique patient population will require sufficiently large curated datasets, which is not practical to use in a real-world clinical set-up. Few-shot learning approaches can not only minimize the need for an enormous number of reliable ground truth labels that are labour-intensive and expensive, but can also be used to model on a dataset coming from a new population. To this end, we propose to exploit an optimization-based implicit model agnostic meta-learning (iMAML) algorithm under few-shot settings for medical image segmentation. Our approach can leverage the learned weights from diverse but small training samples to perform analysis on unseen datasets with high accuracy. We show that, unlike classical few-shot learning approaches, our method improves generalization capability. To our knowledge, this is the first work that exploits iMAML for medical image segmentation and explores the strength of the model on scenarios such as meta-training on unique and mixed instances of lesion datasets. Our quantitative results on publicly available skin and polyp datasets show that the proposed method outperforms the naive supervised baseline model and two recent few-shot segmentation approaches by large margins. In addition, our iMAML approach shows an improvement of 2%–4% in dice score compared to its counterpart MAML for most experiments

    PersA-FL: Personalized Asynchronous Federated Learning

    Full text link
    We study the personalized federated learning problem under asynchronous updates. In this problem, each client seeks to obtain a personalized model that simultaneously outperforms local and global models. We consider two optimization-based frameworks for personalization: (i) Model-Agnostic Meta-Learning (MAML) and (ii) Moreau Envelope (ME). MAML involves learning a joint model adapted for each client through fine-tuning, whereas ME requires a bi-level optimization problem with implicit gradients to enforce personalization via regularized losses. We focus on improving the scalability of personalized federated learning by removing the synchronous communication assumption. Moreover, we extend the studied function class by removing boundedness assumptions on the gradient norm. Our main technical contribution is a unified proof for asynchronous federated learning with bounded staleness that we apply to MAML and ME personalization frameworks. For the smooth and non-convex functions class, we show the convergence of our method to a first-order stationary point. We illustrate the performance of our method and its tolerance to staleness through experiments for classification tasks over heterogeneous datasets

    LambdaOpt: Learn to Regularize Recommender Models in Finer Levels

    Full text link
    Recommendation models mainly deal with categorical variables, such as user/item ID and attributes. Besides the high-cardinality issue, the interactions among such categorical variables are usually long-tailed, with the head made up of highly frequent values and a long tail of rare ones. This phenomenon results in the data sparsity issue, making it essential to regularize the models to ensure generalization. The common practice is to employ grid search to manually tune regularization hyperparameters based on the validation data. However, it requires non-trivial efforts and large computation resources to search the whole candidate space; even so, it may not lead to the optimal choice, for which different parameters should have different regularization strengths. In this paper, we propose a hyperparameter optimization method, LambdaOpt, which automatically and adaptively enforces regularization during training. Specifically, it updates the regularization coefficients based on the performance of validation data. With LambdaOpt, the notorious tuning of regularization hyperparameters can be avoided; more importantly, it allows fine-grained regularization (i.e. each parameter can have an individualized regularization coefficient), leading to better generalized models. We show how to employ LambdaOpt on matrix factorization, a classical model that is representative of a large family of recommender models. Extensive experiments on two public benchmarks demonstrate the superiority of our method in boosting the performance of top-K recommendation.Comment: Accepted by KDD 201

    Transfer Learning via Contextual Invariants for One-to-Many Cross-Domain Recommendation

    Full text link
    The rapid proliferation of new users and items on the social web has aggravated the gray-sheep user/long-tail item challenge in recommender systems. Historically, cross-domain co-clustering methods have successfully leveraged shared users and items across dense and sparse domains to improve inference quality. However, they rely on shared rating data and cannot scale to multiple sparse target domains (i.e., the one-to-many transfer setting). This, combined with the increasing adoption of neural recommender architectures, motivates us to develop scalable neural layer-transfer approaches for cross-domain learning. Our key intuition is to guide neural collaborative filtering with domain-invariant components shared across the dense and sparse domains, improving the user and item representations learned in the sparse domains. We leverage contextual invariances across domains to develop these shared modules, and demonstrate that with user-item interaction context, we can learn-to-learn informative representation spaces even with sparse interaction data. We show the effectiveness and scalability of our approach on two public datasets and a massive transaction dataset from Visa, a global payments technology company (19% Item Recall, 3x faster vs. training separate models for each domain). Our approach is applicable to both implicit and explicit feedback settings.Comment: SIGIR 202

    Deep Bilevel Learning

    Full text link
    We present a novel regularization approach to train neural networks that enjoys better generalization and test error than standard stochastic gradient descent. Our approach is based on the principles of cross-validation, where a validation set is used to limit the model overfitting. We formulate such principles as a bilevel optimization problem. This formulation allows us to define the optimization of a cost on the validation set subject to another optimization on the training set. The overfitting is controlled by introducing weights on each mini-batch in the training set and by choosing their values so that they minimize the error on the validation set. In practice, these weights define mini-batch learning rates in a gradient descent update equation that favor gradients with better generalization capabilities. Because of its simplicity, this approach can be integrated with other regularization methods and training schemes. We evaluate extensively our proposed algorithm on several neural network architectures and datasets, and find that it consistently improves the generalization of the model, especially when labels are noisy.Comment: ECCV 201
    • …
    corecore