8,723 research outputs found

    Stochastic smoothing accelerated gradient method for nonsmooth convex composite optimization

    Full text link
    We propose a novel stochastic smoothing accelerated gradient (SSAG) method for general constrained nonsmooth convex composite optimization, and analyze the convergence rates. The SSAG method allows various smoothing techniques, and can deal with the nonsmooth term that is not easy to compute its proximal term, or that does not own the linear max structure. To the best of our knowledge, it is the first stochastic approximation type method with solid convergence result to solve the convex composite optimization problem whose nonsmooth term is the maximization of numerous nonlinear convex functions. We prove that the SSAG method achieves the best-known complexity bounds in terms of the stochastic first-order oracle (SFO\mathcal{SFO}), using either diminishing smoothing parameters or a fixed smoothing parameter. We give two applications of our results to distributionally robust optimization problems. Numerical results on the two applications demonstrate the effectiveness and efficiency of the proposed SSAG method

    A zeroth-order proximal stochastic gradient method for weakly convex stochastic optimization

    Get PDF
    In this paper we analyze a zeroth-order proximal stochastic gradient method suitable for the minimization of weakly convex stochastic optimization problems. We consider nonsmooth and nonlinear stochastic composite problems, for which (sub)gradient information might be unavailable. The proposed algorithm utilizes the well-known Gaussian smoothing technique, which yields unbiased zeroth-order gradient estimators of a related partially smooth surrogate problem (in which one of the two nonsmooth terms in the original problem’s objective is replaced by a smooth approximation). This allows us to employ a standard proximal stochastic gradient scheme for the approximate solution of the surrogate problem, which is determined by a single smoothing parameter, and without the utilization of first-order information. We provide state-of-the-art convergence rates for the proposed zeroth-order method using minimal assumptions. The proposed scheme is numerically compared against alternative zeroth-order methods as well as a stochastic subgradient scheme on a standard phase retrieval problem. Further, we showcase the usefulness and effectiveness of our method in the unique setting of automated hyperparameter tuning. In particular, we focus on automatically tuning the parameters of optimization algorithms by minimizing a novel heuristic model. The proposed approach is tested on a proximal alternating direction method of multipliers for the solution of In this paper we analyze a zeroth-order proximal stochastic gradient method suitable for the minimization of weakly convex stochastic optimization problems. We consider nonsmooth and nonlinear stochastic composite problems, for which (sub)gradient information might be unavailable. The proposed algorithm utilizes the well-known Gaussian smoothing technique, which yields unbiased zeroth-order gradient estimators of a related partially smooth surrogate problem (in which one of the two nonsmooth terms in the original problem’s objective is replaced by a smooth approximation). This allows us to employ a standard proximal stochastic gradient scheme for the approximate solution of the surrogate problem, which is determined by a single smoothing parameter, and without the utilization of first-order information. We provide state-of-the-art convergence rates for the proposed zeroth-order method using minimal assumptions. The proposed scheme is numerically compared against alternative zeroth-order methods as well as a stochastic subgradient scheme on a standard phase retrieval problem. Further, we showcase the usefulness and effectiveness of our method in the unique setting of automated hyperparameter tuning. In particular, we focus on automatically tuning the parameters of optimization algorithms by minimizing a novel heuristic model. The proposed approach is tested on a proximal alternating direction method of multipliers for the solution of L1/L2-regularized PDE-constrained optimal control problems, with evident empirical success.<br/

    Fast Nonsmooth Regularized Risk Minimization with Continuation

    Full text link
    In regularized risk minimization, the associated optimization problem becomes particularly difficult when both the loss and regularizer are nonsmooth. Existing approaches either have slow or unclear convergence properties, are restricted to limited problem subclasses, or require careful setting of a smoothing parameter. In this paper, we propose a continuation algorithm that is applicable to a large class of nonsmooth regularized risk minimization problems, can be flexibly used with a number of existing solvers for the underlying smoothed subproblem, and with convergence results on the whole algorithm rather than just one of its subproblems. In particular, when accelerated solvers are used, the proposed algorithm achieves the fastest known rates of O(1/T2)O(1/T^2) on strongly convex problems, and O(1/T)O(1/T) on general convex problems. Experiments on nonsmooth classification and regression tasks demonstrate that the proposed algorithm outperforms the state-of-the-art.Comment: AAAI-201

    MAGMA: Multi-level accelerated gradient mirror descent algorithm for large-scale convex composite minimization

    Full text link
    Composite convex optimization models arise in several applications, and are especially prevalent in inverse problems with a sparsity inducing norm and in general convex optimization with simple constraints. The most widely used algorithms for convex composite models are accelerated first order methods, however they can take a large number of iterations to compute an acceptable solution for large-scale problems. In this paper we propose to speed up first order methods by taking advantage of the structure present in many applications and in image processing in particular. Our method is based on multi-level optimization methods and exploits the fact that many applications that give rise to large scale models can be modelled using varying degrees of fidelity. We use Nesterov's acceleration techniques together with the multi-level approach to achieve O(1/ϵ)\mathcal{O}(1/\sqrt{\epsilon}) convergence rate, where ϵ\epsilon denotes the desired accuracy. The proposed method has a better convergence rate than any other existing multi-level method for convex problems, and in addition has the same rate as accelerated methods, which is known to be optimal for first-order methods. Moreover, as our numerical experiments show, on large-scale face recognition problems our algorithm is several times faster than the state of the art
    • …
    corecore