Black-Box α-divergence minimization

Abstract

Black-box alpha (BB-α) is a new approximate inference method based on the minimization of α-divergences. BB-α scales to large datasets because it can be implemented using stochastic gradient descent. BB-α can be applied to complex probabilistic models with little effort since it only requires as input the likelihood function and its gradients. These gradients can be easily obtained using automatic differentiation. By changing the divergence parameter α, the method is able to interpolate between variational Bayes (VB) (α → 0) and an algorithm similar to expectation propagation (EP) (α = 1). Experiments on probit regression and neural network regression and classification problems show that BB-a with non-standard settings of α, such as α = 0.5, usually produces better predictions than with α → 0 (VB) or α = 1 (EP).JMHL acknowledges support from the Rafael del Pino Foundation. YL thanks the Schlumberger Foundation Faculty for the Future fellowship on supporting her PhD study. MR acknowledges support from UK Engineering and Physical Sciences Research Council (EPSRC) grant EP/L016516/1 for the University of Cambridge Centre for Doctoral Training, the Cambridge Centre for Analysis. TDB thanks Google for funding his European Doctoral Fellowship. DHL acknowledge support from Plan National I+D+i, Grant TIN2013-42351-P and TIN2015- 70308-REDT, and from Comunidad de Madrid, Grant S2013/ICE-2845 CASI-CAM-CM. RET thanks EPSRC grant #EP/L000776/1 and #EP/M026957/1

    Similar works