We study the optimisation problem associated with Gaussian process regression
using squared loss. The most common approach to this problem is to apply an
exact solver, such as conjugate gradient descent, either directly, or to a
reduced-order version of the problem. Recently, driven by successes in deep
learning, stochastic gradient descent has gained traction as an alternative. In
this paper, we show that when done right\unicode{x2014}by which we mean using
specific insights from the optimisation and kernel
communities\unicode{x2014}this approach is highly effective. We thus
introduce a particular stochastic dual gradient descent algorithm, that may be
implemented with a few lines of code using any deep learning framework. We
explain our design decisions by illustrating their advantage against
alternatives with ablation studies and show that the new method is highly
competitive. Our evaluations on standard regression benchmarks and a Bayesian
optimisation task set our approach apart from preconditioned conjugate
gradients, variational Gaussian process approximations, and a previous version
of stochastic gradient descent for Gaussian processes. On a molecular binding
affinity prediction task, our method places Gaussian process regression on par
in terms of performance with state-of-the-art graph neural networks