30,408 research outputs found
Meta-Learning Initializations for Image Segmentation
We extend first-order model agnostic meta-learning algorithms (including
FOMAML and Reptile) to image segmentation, present a novel neural network
architecture built for fast learning which we call EfficientLab, and leverage a
formal definition of the test error of meta-learning algorithms to decrease
error on out of distribution tasks. We show state of the art results on the
FSS-1000 dataset by meta-training EfficientLab with FOMAML and using Bayesian
optimization to infer the optimal test-time adaptation routine hyperparameters.
We also construct a small benchmark dataset, FP-k, for the empirical study of
how meta-learning systems perform in both few- and many-shot settings. On the
FP-k dataset, we show that meta-learned initializations provide value for
canonical few-shot image segmentation but their performance is quickly matched
by conventional transfer learning with performance being equal beyond 10
labeled examples. Our code, meta-learned model, and the FP-k dataset are
available at https://github.com/ml4ai/mliis
Meta Architecture Search
Neural Architecture Search (NAS) has been quite successful in constructing
state-of-the-art models on a variety of tasks. Unfortunately, the computational
cost can make it difficult to scale. In this paper, we make the first attempt
to study Meta Architecture Search which aims at learning a task-agnostic
representation that can be used to speed up the process of architecture search
on a large number of tasks. We propose the Bayesian Meta Architecture SEarch
(BASE) framework which takes advantage of a Bayesian formulation of the
architecture search problem to learn over an entire set of tasks
simultaneously. We show that on Imagenet classification, we can find a model
that achieves 25.7% top-1 error and 8.1% top-5 error by adapting the
architecture in less than an hour from an 8 GPU days pretrained meta-network.
By learning a good prior for NAS, our method dramatically decreases the
required computation cost while achieving comparable performance to current
state-of-the-art methods - even finding competitive models for unseen datasets
with very quick adaptation. We believe our framework will open up new
possibilities for efficient and massively scalable architecture search research
across multiple tasks.Comment: 11 pages, 4 figures, 4 tables, 4 pages of appendix; NeurIPS 201
Accelerating Neural Architecture Search using Performance Prediction
Methods for neural network hyperparameter optimization and meta-modeling are
computationally expensive due to the need to train a large number of model
configurations. In this paper, we show that standard frequentist regression
models can predict the final performance of partially trained model
configurations using features based on network architectures, hyperparameters,
and time-series validation performance data. We empirically show that our
performance prediction models are much more effective than prominent Bayesian
counterparts, are simpler to implement, and are faster to train. Our models can
predict final performance in both visual classification and language modeling
domains, are effective for predicting performance of drastically varying model
architectures, and can even generalize between model classes. Using these
prediction models, we also propose an early stopping method for hyperparameter
optimization and meta-modeling, which obtains a speedup of a factor up to 6x in
both hyperparameter optimization and meta-modeling. Finally, we empirically
show that our early stopping method can be seamlessly incorporated into both
reinforcement learning-based architecture selection algorithms and bandit based
search methods. Through extensive experimentation, we empirically show our
performance prediction models and early stopping algorithm are state-of-the-art
in terms of prediction accuracy and speedup achieved while still identifying
the optimal model configurations.Comment: Submitted to International Conference on Learning Representations,
(2018
Taking Human out of Learning Applications: A Survey on Automated Machine Learning
Machine learning techniques have deeply rooted in our everyday life. However,
since it is knowledge- and labor-intensive to pursue good learning performance,
human experts are heavily involved in every aspect of machine learning. In
order to make machine learning techniques easier to apply and reduce the demand
for experienced human experts, automated machine learning (AutoML) has emerged
as a hot topic with both industrial and academic interest. In this paper, we
provide an up to date survey on AutoML. First, we introduce and define the
AutoML problem, with inspiration from both realms of automation and machine
learning. Then, we propose a general AutoML framework that not only covers most
existing approaches to date but also can guide the design for new methods.
Subsequently, we categorize and review the existing works from two aspects,
i.e., the problem setup and the employed techniques. Finally, we provide a
detailed analysis of AutoML approaches and explain the reasons underneath their
successful applications. We hope this survey can serve as not only an
insightful guideline for AutoML beginners but also an inspiration for future
research.Comment: This is a preliminary and will be kept update
Adaptive Bayesian Linear Regression for Automated Machine Learning
To solve a machine learning problem, one typically needs to perform data
preprocessing, modeling, and hyperparameter tuning, which is known as model
selection and hyperparameter optimization.The goal of automated machine
learning (AutoML) is to design methods that can automatically perform model
selection and hyperparameter optimization without human interventions for a
given dataset. In this paper, we propose a meta-learning method that can search
for a high-performance machine learning pipeline from the predefined set of
candidate pipelines for supervised classification datasets in an efficient way
by leveraging meta-data collected from previous experiments. More specifically,
our method combines an adaptive Bayesian regression model with a neural network
basis function and the acquisition function from Bayesian optimization. The
adaptive Bayesian regression model is able to capture knowledge from previous
meta-data and thus make predictions of the performances of machine learning
pipelines on a new dataset. The acquisition function is then used to guide the
search of possible pipelines based on the predictions.The experiments
demonstrate that our approach can quickly identify high-performance pipelines
for a range of test datasets and outperforms the baseline methods.Comment: Added references;Corrected typos.Revised argument,results unchange
A Review of Meta-Reinforcement Learning for Deep Neural Networks Architecture Search
Deep Neural networks are efficient and flexible models that perform well for
a variety of tasks such as image, speech recognition and natural language
understanding. In particular, convolutional neural networks (CNN) generate a
keen interest among researchers in computer vision and more specifically in
classification tasks. CNN architecture and related hyperparameters are
generally correlated to the nature of the processed task as the network
extracts complex and relevant characteristics allowing the optimal convergence.
Designing such architectures requires significant human expertise, substantial
computation time and doesn't always lead to the optimal network. Model
configuration topic has been extensively studied in machine learning without
leading to a standard automatic method. This survey focuses on reviewing and
discussing the current progress in automating CNN architecture search
Random Search and Reproducibility for Neural Architecture Search
Neural architecture search (NAS) is a promising research direction that has
the potential to replace expert-designed networks with learned, task-specific
architectures. In this work, in order to help ground the empirical results in
this field, we propose new NAS baselines that build off the following
observations: (i) NAS is a specialized hyperparameter optimization problem; and
(ii) random search is a competitive baseline for hyperparameter optimization.
Leveraging these observations, we evaluate both random search with
early-stopping and a novel random search with weight-sharing algorithm on two
standard NAS benchmarks---PTB and CIFAR-10. Our results show that random search
with early-stopping is a competitive NAS baseline, e.g., it performs at least
as well as ENAS, a leading NAS method, on both benchmarks. Additionally, random
search with weight-sharing outperforms random search with early-stopping,
achieving a state-of-the-art NAS result on PTB and a highly competitive result
on CIFAR-10. Finally, we explore the existing reproducibility issues of
published NAS results. We note the lack of source material needed to exactly
reproduce these results, and further discuss the robustness of published
results given the various sources of variability in NAS experimental setups.
Relatedly, we provide all information (code, random seeds, documentation)
needed to exactly reproduce our results, and report our random search with
weight-sharing results for each benchmark on multiple runs.Comment: V2 Changelog: - Modified footnote 2 for ENAS. - Expanded broad
reproducibility study for random search with WS for CNN to 6 sets of random
seeds v3 Changelog: - Added journal reference - Updated acknowledgement
Reconciling meta-learning and continual learning with online mixtures of tasks
Learning-to-learn or meta-learning leverages data-driven inductive bias to
increase the efficiency of learning on a novel task. This approach encounters
difficulty when transfer is not advantageous, for instance, when tasks are
considerably dissimilar or change over time. We use the connection between
gradient-based meta-learning and hierarchical Bayes to propose a Dirichlet
process mixture of hierarchical Bayesian models over the parameters of an
arbitrary parametric model such as a neural network. In contrast to
consolidating inductive biases into a single set of hyperparameters, our
approach of task-dependent hyperparameter selection better handles latent
distribution shift, as demonstrated on a set of evolving, image-based, few-shot
learning benchmarks.Comment: updated experimental result
Bayesian Optimized Continual Learning with Attention Mechanism
Though neural networks have achieved much progress in various applications,
it is still highly challenging for them to learn from a continuous stream of
tasks without forgetting. Continual learning, a new learning paradigm, aims to
solve this issue. In this work, we propose a new model for continual learning,
called Bayesian Optimized Continual Learning with Attention Mechanism (BOCL)
that dynamically expands the network capacity upon the arrival of new tasks by
Bayesian optimization and selectively utilizes previous knowledge (e.g. feature
maps of previous tasks) via attention mechanism. Our experiments on variants of
MNIST and CIFAR-100 demonstrate that our methods outperform the
state-of-the-art in preventing catastrophic forgetting and fitting new tasks
better.Comment: 8 page
HyperSTAR: Task-Aware Hyperparameters for Deep Networks
While deep neural networks excel in solving visual recognition tasks, they
require significant effort to find hyperparameters that make them work
optimally. Hyperparameter Optimization (HPO) approaches have automated the
process of finding good hyperparameters but they do not adapt to a given task
(task-agnostic), making them computationally inefficient. To reduce HPO time,
we present HyperSTAR (System for Task Aware Hyperparameter Recommendation), a
task-aware method to warm-start HPO for deep neural networks. HyperSTAR ranks
and recommends hyperparameters by predicting their performance conditioned on a
joint dataset-hyperparameter space. It learns a dataset (task) representation
along with the performance predictor directly from raw images in an end-to-end
fashion. The recommendations, when integrated with an existing HPO method, make
it task-aware and significantly reduce the time to achieve optimal performance.
We conduct extensive experiments on 10 publicly available large-scale image
classification datasets over two different network architectures, validating
that HyperSTAR evaluates 50% less configurations to achieve the best
performance compared to existing methods. We further demonstrate that HyperSTAR
makes Hyperband (HB) task-aware, achieving the optimal accuracy in just 25% of
the budget required by both vanilla HB and Bayesian Optimized HB~(BOHB).Comment: Published at CVPR 2020 (Oral
- …