7 research outputs found

    Alpha MAML: Adaptive Model-Agnostic Meta-Learning

    Full text link
    Model-agnostic meta-learning (MAML) is a meta-learning technique to train a model on a multitude of learning tasks in a way that primes the model for few-shot learning of new tasks. The MAML algorithm performs well on few-shot learning problems in classification, regression, and fine-tuning of policy gradients in reinforcement learning, but comes with the need for costly hyperparameter tuning for training stability. We address this shortcoming by introducing an extension to MAML, called Alpha MAML, to incorporate an online hyperparameter adaptation scheme that eliminates the need to tune meta-learning and learning rates. Our results with the Omniglot database demonstrate a substantial reduction in the need to tune MAML training hyperparameters and improvement to training stability with less sensitivity to hyperparameter choice.Comment: 6th ICML Workshop on Automated Machine Learning (2019

    Efficiently Robustify Pre-trained Models

    Full text link
    A recent trend in deep learning algorithms has been towards training large scale models, having high parameter count and trained on big dataset. However, robustness of such large scale models towards real-world settings is still a less-explored topic. In this work, we first benchmark the performance of these models under different perturbations and datasets thereby representing real-world shifts, and highlight their degrading performance under these shifts. We then discuss on how complete model fine-tuning based existing robustification schemes might not be a scalable option given very large scale networks and can also lead them to forget some of the desired characterstics. Finally, we propose a simple and cost-effective method to solve this problem, inspired by knowledge transfer literature. It involves robustifying smaller models, at a lower computation cost, and then use them as teachers to tune a fraction of these large scale networks, reducing the overall computational overhead. We evaluate our proposed method under various vision perturbations including ImageNet-C,R,S,A datasets and also for transfer learning, zero-shot evaluation setups on different datasets. Benchmark results show that our method is able to induce robustness to these large scale models efficiently, requiring significantly lower time and also preserves the transfer learning, zero-shot properties of the original model which none of the existing methods are able to achieve

    STEER: Simple Temporal Regularization For Neural ODEs

    Full text link
    Training Neural Ordinary Differential Equations (ODEs) is often computationally expensive. Indeed, computing the forward pass of such models involves solving an ODE which can become arbitrarily complex during training. Recent works have shown that regularizing the dynamics of the ODE can partially alleviate this. In this paper we propose a new regularization technique: randomly sampling the end time of the ODE during training. The proposed regularization is simple to implement, has negligible overhead and is effective across a wide variety of tasks. Further, the technique is orthogonal to several other methods proposed to regularize the dynamics of ODEs and as such can be used in conjunction with them. We show through experiments on normalizing flows, time series models and image recognition that the proposed regularization can significantly decrease training time and even improve performance over baseline models.Comment: Neurips 202

    Incremental tube construction for human action detection

    Get PDF
    Current state-of-the-art action detection systems are tailored for offline batch-processing applications. However, for online applications like human-robot interaction, current systems fall short. In this work, we introduce a real-time and online joint-labelling and association algorithm for action detection that can incrementally construct space-time action tubes on the most challenging untrimmed action videos in which different action categories occur concurrently. In contrast to previous methods, we solve the linking, action labelling and temporal localization problems jointly in a single pass. We demonstrate superior online association accuracy and speed (1.8ms per frame) as compared to the current state-of-the-art offline and online systems

    Automated and verified deep learning

    No full text
    In the last decade, deep learning has enabled remarkable progress in various fields such as image recognition, machine translation, and speech recognition. We are also witnessing an explosion in the range of applications. However, there are many challenges that stand in the way of the widespread deployment of deep learning. In this thesis, we focus on two of the key challenges, namely, neural network verification and automated machine learning. Firstly, deep neural networks are infamous for being `black boxes' and making unexpected mistakes. For reliable AI, we want systems that are consistent with specifications like fairness, unbiasedness and robustness. We focus on verifying the adversarial robustness of neural networks, which aims at proving the existence or non-existence of an adversarial example. This non-convex problem is commonly approximated with a convex relaxation. We make two important contributions in this direction. First, we propose a specialised dual solver for a new convex relaxation. This was essential because although the relaxation is tighter than previous relaxations, it has an exponential number of constraints that make the existing dual solvers inapplicable. Second, we design a tighter relaxation for the problem of verifying robustness to input perturbations within the probability simplex. The size of our relaxation is linear in the number of neurons, which enables us to design simpler and efficient algorithms. Empirically, we demonstrate the performance by verifying the respective specifications on common verification benchmarks. Secondly, deep neural networks require extensive human effort and expertise. We consider automated machine learning or meta learning which aims at automating the process of applying machine learning. We make three contributions in this context. First, we propose efficient approximations for the bi-level formulation of meta learning. We show its efficiency in the context of learning to generate synthetic data for training neural networks by optimizing state-of-the-art photorealistic renderers. Second, we propose a technique to automatically optimize the learning rate of gradient-based meta learning algorithms. We demonstrate a substantial reduction in the need to tune training hyperparameters. Third, we show an application by tackling video segmentation as a meta learning problem and demonstrating state-of-the-art results on common benchmarks
    corecore