In this paper, we use Prior-data Fitted Networks (PFNs) as a flexible
surrogate for Bayesian Optimization (BO). PFNs are neural processes that are
trained to approximate the posterior predictive distribution (PPD) for any
prior distribution that can be efficiently sampled from. We describe how this
flexibility can be exploited for surrogate modeling in BO. We use PFNs to mimic
a naive Gaussian process (GP), an advanced GP, and a Bayesian Neural Network
(BNN). In addition, we show how to incorporate further information into the
prior, such as allowing hints about the position of optima (user priors),
ignoring irrelevant dimensions, and performing non-myopic BO by learning the
acquisition function. The flexibility underlying these extensions opens up vast
possibilities for using PFNs for BO. We demonstrate the usefulness of PFNs for
BO in a large-scale evaluation on artificial GP samples and three different
hyperparameter optimization testbeds: HPO-B, Bayesmark, and PD1. We publish
code alongside trained models at http://github.com/automl/PFNs4BO.Comment: Accepted at ICML 202