8 research outputs found

    An accelerated first-order method with complexity analysis for solving cubic regularization subproblems

    Full text link
    We propose a first-order method to solve the cubic regularization subproblem (CRS) based on a novel reformulation. The reformulation is a constrained convex optimization problem whose feasible region admits an easily computable projection. Our reformulation requires computing the minimum eigenvalue of the Hessian. To avoid the expensive computation of the exact minimum eigenvalue, we develop a surrogate problem to the reformulation where the exact minimum eigenvalue is replaced with an approximate one. We then apply first-order methods such as the Nesterov's accelerated projected gradient method (APG) and projected Barzilai-Borwein method to solve the surrogate problem. As our main theoretical contribution, we show that when an ϵ\epsilon-approximate minimum eigenvalue is computed by the Lanczos method and the surrogate problem is approximately solved by APG, our approach returns an ϵ\epsilon-approximate solution to CRS in O~(ϵ1/2)\tilde O(\epsilon^{-1/2}) matrix-vector multiplications (where O~()\tilde O(\cdot) hides the logarithmic factors). Numerical experiments show that our methods are comparable to and outperform the Krylov subspace method in the easy and hard cases, respectively. We further implement our methods as subproblem solvers of adaptive cubic regularization methods, and numerical results show that our algorithms are comparable to the state-of-the-art algorithms

    A Stochastic Tensor Method for Non-convex Optimization

    Full text link
    We present a stochastic optimization method that uses a fourth-order regularized model to find local minima of smooth and potentially non-convex objective functions with a finite-sum structure. This algorithm uses sub-sampled derivatives instead of exact quantities. The proposed approach is shown to find an (ϵ1,ϵ2,ϵ3)(\epsilon_1,\epsilon_2,\epsilon_3)-third-order critical point in at most \bigO\left(\max\left(\epsilon_1^{-4/3}, \epsilon_2^{-2}, \epsilon_3^{-4}\right)\right) iterations, thereby matching the rate of deterministic approaches. In order to prove this result, we derive a novel tensor concentration inequality for sums of tensors of any order that makes explicit use of the finite-sum structure of the objective function
    corecore