Search CORE

12,331 research outputs found

Fast Bayesian Optimization of Machine Learning Hyperparameters on Large Datasets

Author: Bartels Simon
Falkner Stefan
Hennig Philipp
Hutter Frank
Klein Aaron
Publication venue
Publication date: 01/01/2017
Field of study

Bayesian optimization has become a successful tool for hyperparameter optimization of machine learning algorithms, such as support vector machines or deep neural networks. Despite its success, for large datasets, training and validating a single configuration often takes hours, days, or even weeks, which limits the achievable performance. To accelerate hyperparameter optimization, we propose a generative model for the validation error as a function of training set size, which is learned during the optimization process and allows exploration of preliminary configurations on small subsets, by extrapolating to the full dataset. We construct a Bayesian optimization procedure, dubbed Fabolas, which models loss and training time as a function of dataset size and automatically trades off high information gain about the global optimum against computational cost. Experiments optimizing support vector machines and deep neural networks show that Fabolas often finds high-quality solutions 10 to 100 times faster than other state-of-the-art Bayesian optimization methods or the recently proposed bandit strategy Hyperband

arXiv.org e-Print Archive

MPG.PuRe

Sample Efficient Optimization for Learning Controllers for Bipedal Locomotion

Author: Antonova Rika
Atkeson Christopher G.
Rai Akshara
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Learning policies for bipedal locomotion can be difficult, as experiments are expensive and simulation does not usually transfer well to hardware. To counter this, we need al- gorithms that are sample efficient and inherently safe. Bayesian Optimization is a powerful sample-efficient tool for optimizing non-convex black-box functions. However, its performance can degrade in higher dimensions. We develop a distance metric for bipedal locomotion that enhances the sample-efficiency of Bayesian Optimization and use it to train a 16 dimensional neuromuscular model for planar walking. This distance metric reflects some basic gait features of healthy walking and helps us quickly eliminate a majority of unstable controllers. With our approach we can learn policies for walking in less than 100 trials for a range of challenging settings. In simulation, we show results on two different costs and on various terrains including rough ground and ramps, sloping upwards and downwards. We also perturb our models with unknown inertial disturbances analogous with differences between simulation and hardware. These results are promising, as they indicate that this method can potentially be used to learn control policies on hardware.Comment: To appear in International Conference on Humanoid Robots (Humanoids '2016), IEEE-RAS. (Rika Antonova and Akshara Rai contributed equally

arXiv.org e-Print Archive

Publikationer från KTH

Crossref

Digitala Vetenskapliga Arkivet - Academic Archive On-line

MPG.PuRe

A multi-resolution, non-parametric, Bayesian framework for identification of spatially-varying model parameters

Author: Ainsworth
Barry
Brigham
Chen
Chopin
Craig
Del Moral
Del Moral
Del Moral
Dorobantu
Dostert
Doucet
E
Emery
Engl
Fadale
Fatemi
Ferreira
Fienen
Gelman
Green
Groetsch
Hegstad
Higdon
Higdon
Higdon
Holloman
Hou
Hughes
Jefferys
Kaipio
Kennedy
Kevrekidis
Kimeldorf
Kitanidis
Koutsourelakis
Lee
Lee
Lewicki
Liu
Liu
Liu
MacEachern
P.S. Koutsourelakis
Robert
Sangali
Schmidt
Simo
Tikhonov
Tikhonov
Tipping
Torquato
Velamur Asokan
Vermaak
Wang
Wang
Wang
Weir
Wynn
Publication venue: 'Elsevier BV'
Publication date: 03/10/2008
Field of study

This paper proposes a hierarchical, multi-resolution framework for the identification of model parameters and their spatially variability from noisy measurements of the response or output. Such parameters are frequently encountered in PDE-based models and correspond to quantities such as density or pressure fields, elasto-plastic moduli and internal variables in solid mechanics, conductivity fields in heat diffusion problems, permeability fields in fluid flow through porous media etc. The proposed model has all the advantages of traditional Bayesian formulations such as the ability to produce measures of confidence for the inferences made and providing not only predictive estimates but also quantitative measures of the predictive uncertainty. In contrast to existing approaches it utilizes a parsimonious, non-parametric formulation that favors sparse representations and whose complexity can be determined from the data. The proposed framework in non-intrusive and makes use of a sequence of forward solvers operating at various resolutions. As a result, inexpensive, coarse solvers are used to identify the most salient features of the unknown field(s) which are subsequently enriched by invoking solvers operating at finer resolutions. This leads to significant computational savings particularly in problems involving computationally demanding forward models but also improvements in accuracy. It is based on a novel, adaptive scheme based on Sequential Monte Carlo sampling which is embarrassingly parallelizable and circumvents issues with slow mixing encountered in Markov Chain Monte Carlo schemes

arXiv.org e-Print Archive

Heriot Watt Pure

Crossref

The Deep Weight Prior

Author: Ashukha Arsenii
Atanov Andrei
Struminsky Kirill
Vetrov Dmitry
Welling Max
Publication venue
Publication date: 18/02/2019
Field of study

Bayesian inference is known to provide a general framework for incorporating prior knowledge or specific properties into machine learning models via carefully choosing a prior distribution. In this work, we propose a new type of prior distributions for convolutional neural networks, deep weight prior (DWP), that exploit generative models to encourage a specific structure of trained convolutional filters e.g., spatial correlations of weights. We define DWP in the form of an implicit distribution and propose a method for variational inference with such type of implicit priors. In experiments, we show that DWP improves the performance of Bayesian neural networks when training data are limited, and initialization of weights with samples from DWP accelerates training of conventional convolutional neural networks.Comment: TL;DR: The deep weight prior learns a generative model for kernels of convolutional neural networks, that acts as a prior distribution while training on new dataset

arXiv.org e-Print Archive

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Communication Theoretic Data Analytics

Author: Chen Kwang-Cheng
Huang Shao-Lun
Poor H. Vincent
Zheng Lizhong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 21/01/2015
Field of study

Widespread use of the Internet and social networks invokes the generation of big data, which is proving to be useful in a number of applications. To deal with explosively growing amounts of data, data analytics has emerged as a critical technology related to computing, signal processing, and information networking. In this paper, a formalism is considered in which data is modeled as a generalized social network and communication theory and information theory are thereby extended to data analytics. First, the creation of an equalizer to optimize information transfer between two data variables is considered, and financial data is used to demonstrate the advantages. Then, an information coupling approach based on information geometry is applied for dimensionality reduction, with a pattern recognition example to illustrate the effectiveness. These initial trials suggest the potential of communication theoretic data analytics for a wide range of applications.Comment: Published in IEEE Journal on Selected Areas in Communications, Jan. 201

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref