28,439 research outputs found
Practical Bayesian optimization in the presence of outliers
Inference in the presence of outliers is an important field of research as
outliers are ubiquitous and may arise across a variety of problems and domains.
Bayesian optimization is method that heavily relies on probabilistic inference.
This allows outstanding sample efficiency because the probabilistic machinery
provides a memory of the whole optimization process. However, that virtue
becomes a disadvantage when the memory is populated with outliers, inducing
bias in the estimation. In this paper, we present an empirical evaluation of
Bayesian optimization methods in the presence of outliers. The empirical
evidence shows that Bayesian optimization with robust regression often produces
suboptimal results. We then propose a new algorithm which combines robust
regression (a Gaussian process with Student-t likelihood) with outlier
diagnostics to classify data points as outliers or inliers. By using an
scheduler for the classification of outliers, our method is more efficient and
has better convergence over the standard robust regression. Furthermore, we
show that even in controlled situations with no expected outliers, our method
is able to produce better results.Comment: 10 pages (2 of references), 6 figures, 1 algorith
Laplace approximation and natural gradient for Gaussian process regression with heteroscedastic Student-t model
We propose the Laplace method to derive approximate inference for Gaussian process (GP) regression in the location and scale parameters of the student-t probabilistic model. This allows both mean and variance of data to vary as a function of covariates with the attractive feature that the student-t model has been widely used as a useful tool for robustifying data analysis. The challenge in the approximate inference for the model, lies in the analytical intractability of the posterior distribution and the lack of concavity of the log-likelihood function. We present the natural gradient adaptation for the estimation process which primarily relies on the property that the student-t model naturally has orthogonal parametrization. Due to this particular property of the model the Laplace approximation becomes significantly more robust than the traditional approach using Newton’s methods. We also introduce an alternative Laplace approximation by using model’s Fisher information matrix. According to experiments this alternative approximation provides very similar posterior approximations and predictive performance to the traditional Laplace approximation with model’s Hessian matrix. However, the proposed Laplace–Fisher approximation is faster and more stable to calculate compared to the traditional Laplace approximation. We also compare both of these Laplace approximations with the Markov chain Monte Carlo (MCMC) method. We discuss how our approach can, in general, improve the inference algorithm in cases where the probabilistic model assumed for the data is not log-concave.Peer reviewe
Student-t Processes as Alternatives to Gaussian Processes
We investigate the Student-t process as an alternative to the Gaussian
process as a nonparametric prior over functions. We derive closed form
expressions for the marginal likelihood and predictive distribution of a
Student-t process, by integrating away an inverse Wishart process prior over
the covariance kernel of a Gaussian process model. We show surprising
equivalences between different hierarchical Gaussian process models leading to
Student-t processes, and derive a new sampling scheme for the inverse Wishart
process, which helps elucidate these equivalences. Overall, we show that a
Student-t process can retain the attractive properties of a Gaussian process --
a nonparametric representation, analytic marginal and predictive distributions,
and easy model selection through covariance kernels -- but has enhanced
flexibility, and predictive covariances that, unlike a Gaussian process,
explicitly depend on the values of training observations. We verify empirically
that a Student-t process is especially useful in situations where there are
changes in covariance structure, or in applications like Bayesian optimization,
where accurate predictive covariances are critical for good performance. These
advantages come at no additional computational cost over Gaussian processes.Comment: 13 pages, 6 figures, 1 table. To appear in "The Seventeenth
International Conference on Artificial Intelligence and Statistics (AISTATS),
2014.
- …