5 research outputs found
Alpha MAML: Adaptive Model-Agnostic Meta-Learning
Model-agnostic meta-learning (MAML) is a meta-learning technique to train a
model on a multitude of learning tasks in a way that primes the model for
few-shot learning of new tasks. The MAML algorithm performs well on few-shot
learning problems in classification, regression, and fine-tuning of policy
gradients in reinforcement learning, but comes with the need for costly
hyperparameter tuning for training stability. We address this shortcoming by
introducing an extension to MAML, called Alpha MAML, to incorporate an online
hyperparameter adaptation scheme that eliminates the need to tune meta-learning
and learning rates. Our results with the Omniglot database demonstrate a
substantial reduction in the need to tune MAML training hyperparameters and
improvement to training stability with less sensitivity to hyperparameter
choice.Comment: 6th ICML Workshop on Automated Machine Learning (2019
Learning What to Learn for Video Object Segmentation
Video object segmentation (VOS) is a highly challenging problem, since the
target object is only defined during inference with a given first-frame
reference mask. The problem of how to capture and utilize this limited target
information remains a fundamental research question. We address this by
introducing an end-to-end trainable VOS architecture that integrates a
differentiable few-shot learning module. This internal learner is designed to
predict a powerful parametric model of the target by minimizing a segmentation
error in the first frame. We further go beyond standard few-shot learning
techniques by learning what the few-shot learner should learn. This allows us
to achieve a rich internal representation of the target in the current frame,
significantly increasing the segmentation accuracy of our approach. We perform
extensive experiments on multiple benchmarks. Our approach sets a new
state-of-the-art on the large-scale YouTube-VOS 2018 dataset by achieving an
overall score of 81.5, corresponding to a 2.6% relative improvement over the
previous best result.Comment: First two authors contributed equall
Meta-learning deep visual words for fast video object segmentation
Accurate video object segmentation methods finetune a model using the first annotated frame, and/or use additional
inputs such as optical flow and complex post-processing. In contrast, we develop a fast algorithm that requires no finetuning, auxiliary inputs or post-processing, and segments a variable number of objects in a single forward-pass. We
represent an object with clusters, or “visual words”, in the embedding space, which correspond to object parts in the image space. This allows us to robustly match to the reference objects throughout the video, because although the
global appearance of an object changes as it undergoes occlusions and deformations, the appearance of more local parts may stay consistent. We learn these visual words in an unsupervised manner, using meta-learning to ensure that
our training objective matches our inference procedure. We achieve comparable accuracy to finetuning based methods,
and state-of-the-art in terms of speed/accuracy trade-offs on four video segmentation datasets