12,331 research outputs found
Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets
Bayesian optimization has become a successful tool for hyperparameter
optimization of machine learning algorithms, such as support vector machines or
deep neural networks. Despite its success, for large datasets, training and
validating a single configuration often takes hours, days, or even weeks, which
limits the achievable performance. To accelerate hyperparameter optimization,
we propose a generative model for the validation error as a function of
training set size, which is learned during the optimization process and allows
exploration of preliminary configurations on small subsets, by extrapolating to
the full dataset. We construct a Bayesian optimization procedure, dubbed
Fabolas, which models loss and training time as a function of dataset size and
automatically trades off high information gain about the global optimum against
computational cost. Experiments optimizing support vector machines and deep
neural networks show that Fabolas often finds high-quality solutions 10 to 100
times faster than other state-of-the-art Bayesian optimization methods or the
recently proposed bandit strategy Hyperband
Sample Efficient Optimization for Learning Controllers for Bipedal Locomotion
Learning policies for bipedal locomotion can be difficult, as experiments are
expensive and simulation does not usually transfer well to hardware. To counter
this, we need al- gorithms that are sample efficient and inherently safe.
Bayesian Optimization is a powerful sample-efficient tool for optimizing
non-convex black-box functions. However, its performance can degrade in higher
dimensions. We develop a distance metric for bipedal locomotion that enhances
the sample-efficiency of Bayesian Optimization and use it to train a 16
dimensional neuromuscular model for planar walking. This distance metric
reflects some basic gait features of healthy walking and helps us quickly
eliminate a majority of unstable controllers. With our approach we can learn
policies for walking in less than 100 trials for a range of challenging
settings. In simulation, we show results on two different costs and on various
terrains including rough ground and ramps, sloping upwards and downwards. We
also perturb our models with unknown inertial disturbances analogous with
differences between simulation and hardware. These results are promising, as
they indicate that this method can potentially be used to learn control
policies on hardware.Comment: To appear in International Conference on Humanoid Robots (Humanoids
'2016), IEEE-RAS. (Rika Antonova and Akshara Rai contributed equally
A multi-resolution, non-parametric, Bayesian framework for identification of spatially-varying model parameters
This paper proposes a hierarchical, multi-resolution framework for the
identification of model parameters and their spatially variability from noisy
measurements of the response or output. Such parameters are frequently
encountered in PDE-based models and correspond to quantities such as density or
pressure fields, elasto-plastic moduli and internal variables in solid
mechanics, conductivity fields in heat diffusion problems, permeability fields
in fluid flow through porous media etc. The proposed model has all the
advantages of traditional Bayesian formulations such as the ability to produce
measures of confidence for the inferences made and providing not only
predictive estimates but also quantitative measures of the predictive
uncertainty. In contrast to existing approaches it utilizes a parsimonious,
non-parametric formulation that favors sparse representations and whose
complexity can be determined from the data. The proposed framework in
non-intrusive and makes use of a sequence of forward solvers operating at
various resolutions. As a result, inexpensive, coarse solvers are used to
identify the most salient features of the unknown field(s) which are
subsequently enriched by invoking solvers operating at finer resolutions. This
leads to significant computational savings particularly in problems involving
computationally demanding forward models but also improvements in accuracy. It
is based on a novel, adaptive scheme based on Sequential Monte Carlo sampling
which is embarrassingly parallelizable and circumvents issues with slow mixing
encountered in Markov Chain Monte Carlo schemes
The Deep Weight Prior
Bayesian inference is known to provide a general framework for incorporating
prior knowledge or specific properties into machine learning models via
carefully choosing a prior distribution. In this work, we propose a new type of
prior distributions for convolutional neural networks, deep weight prior (DWP),
that exploit generative models to encourage a specific structure of trained
convolutional filters e.g., spatial correlations of weights. We define DWP in
the form of an implicit distribution and propose a method for variational
inference with such type of implicit priors. In experiments, we show that DWP
improves the performance of Bayesian neural networks when training data are
limited, and initialization of weights with samples from DWP accelerates
training of conventional convolutional neural networks.Comment: TL;DR: The deep weight prior learns a generative model for kernels of
convolutional neural networks, that acts as a prior distribution while
training on new dataset
Communication Theoretic Data Analytics
Widespread use of the Internet and social networks invokes the generation of
big data, which is proving to be useful in a number of applications. To deal
with explosively growing amounts of data, data analytics has emerged as a
critical technology related to computing, signal processing, and information
networking. In this paper, a formalism is considered in which data is modeled
as a generalized social network and communication theory and information theory
are thereby extended to data analytics. First, the creation of an equalizer to
optimize information transfer between two data variables is considered, and
financial data is used to demonstrate the advantages. Then, an information
coupling approach based on information geometry is applied for dimensionality
reduction, with a pattern recognition example to illustrate the effectiveness.
These initial trials suggest the potential of communication theoretic data
analytics for a wide range of applications.Comment: Published in IEEE Journal on Selected Areas in Communications, Jan.
201
- …