Anderson acceleration (or Anderson mixing) is an efficient acceleration
method for fixed point iterations xt+1=G(xt), e.g., gradient descent can
be viewed as iteratively applying the operation G(x)≜x−α∇f(x). It is known that Anderson acceleration is quite efficient in practice
and can be viewed as an extension of Krylov subspace methods for nonlinear
problems. In this paper, we show that Anderson acceleration with Chebyshev
polynomial can achieve the optimal convergence rate
O(κlnϵ1), which improves the previous result
O(κlnϵ1) provided by (Toth and Kelley, 2015) for
quadratic functions. Moreover, we provide a convergence analysis for minimizing
general nonlinear problems. Besides, if the hyperparameters (e.g., the
Lipschitz smooth parameter L) are not available, we propose a guessing
algorithm for guessing them dynamically and also prove a similar convergence
rate. Finally, the experimental results demonstrate that the proposed
Anderson-Chebyshev acceleration method converges significantly faster than
other algorithms, e.g., vanilla gradient descent (GD), Nesterov's Accelerated
GD. Also, these algorithms combined with the proposed guessing algorithm
(guessing the hyperparameters dynamically) achieve much better performance.Comment: To appear in AISTATS 202