53 research outputs found
Raiders of the Lost Architecture: Kernels for Bayesian Optimization in Conditional Parameter Spaces
In practical Bayesian optimization, we must often search over structures with
differing numbers of parameters. For instance, we may wish to search over
neural network architectures with an unknown number of layers. To relate
performance data gathered for different architectures, we define a new kernel
for conditional parameter spaces that explicitly includes information about
which parameters are relevant in a given structure. We show that this kernel
improves model quality and Bayesian optimization results over several simpler
baseline kernels.Comment: 6 pages, 3 figures. Appeared in the NIPS 2013 workshop on Bayesian
optimizatio
Efficient Feature Learning Using Perturb-and-MAP
Perturb-and-MAP [1] is a technique for efficiently drawing approximate samples from discrete probabilistic graphical models. These samples are useful for both characterizing the uncertainty in the model, as well as learning its parameters. In this work, we show that this same technique is effective at learning features from images using graphical models with complex dependencies between variables. In particular, we apply this technique in order to learn the parameters of a latentvariable model, the restricted Boltzmann machine, with additional higher-order potentials. We also use it in a bipartite matching model to learn features that are specifically tailored to tracking image patches in video sequences. Our final contribution is the proposal of a novel method for generating perturbations.
Learning unbiased features
A key element in transfer learning is representation learning; if
representations can be developed that expose the relevant factors underlying
the data, then new tasks and domains can be learned readily based on mappings
of these salient factors. We propose that an important aim for these
representations are to be unbiased. Different forms of representation learning
can be derived from alternative definitions of unwanted bias, e.g., bias to
particular tasks, domains, or irrelevant underlying data dimensions. One very
useful approach to estimating the amount of bias in a representation comes from
maximum mean discrepancy (MMD) [5], a measure of distance between probability
distributions. We are not the first to suggest that MMD can be a useful
criterion in developing representations that apply across multiple domains or
tasks [1]. However, in this paper we describe a number of novel applications of
this criterion that we have devised, all based on the idea of developing
unbiased representations. These formulations include: a standard domain
adaptation framework; a method of learning invariant representations; an
approach based on noise-insensitive autoencoders; and a novel form of
generative model.Comment: Published in NIPS 2014 Workshop on Transfer and Multitask Learning,
see http://nips.cc/Conferences/2014/Program/event.php?ID=428
Recommended from our members
Multi-Task Bayesian Optimization
Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up -fold cross-validation. Lastly, our most significant contribution is an adaptation of a recently proposed acquisition function, entropy search, to the cost-sensitive and multi-task settings. We demonstrate the utility of this new acquisition function by utilizing a small dataset in order to explore hyperparameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost.Engineering and Applied Science
Learning Hard Alignments with Variational Inference
There has recently been significant interest in hard attention models for
tasks such as object recognition, visual captioning and speech recognition.
Hard attention can offer benefits over soft attention such as decreased
computational cost, but training hard attention models can be difficult because
of the discrete latent variables they introduce. Previous work used REINFORCE
and Q-learning to approach these issues, but those methods can provide
high-variance gradient estimates and be slow to train. In this paper, we tackle
the problem of learning hard attention for a sequential task using variational
inference methods, specifically the recently introduced VIMCO and NVIL.
Furthermore, we propose a novel baseline that adapts VIMCO to this setting. We
demonstrate our method on a phoneme recognition task in clean and noisy
environments and show that our method outperforms REINFORCE, with the
difference being greater for a more complicated task
- …