616 research outputs found

    Efficient Smooth Non-Convex Stochastic Compositional Optimization via Stochastic Recursive Gradient Descent

    Get PDF
    Stochastic compositional optimization arises in many important machine learning applications. The objective function is the composition of two expectations of stochastic functions, and is more challenging to optimize than vanilla stochastic optimization problems. In this paper, we investigate the stochastic compositional optimization in the general smooth non-convex setting. We employ a recently developed idea of Stochastic Recursive Gradient Descent to design a novel algorithm named SARAH-Compositional, and prove a sharp Incremental First-order Oracle (IFO) complexity upper bound for stochastic compositional optimization: ((n + m)1/2ε-2) in the finite-sum case and (ε-3) in the online case. Such a complexity is known to be the best one among IFO complexity results for non-convex stochastic compositional optimization. Numerical experiments on risk-adverse portfolio management validate the superiority of SARAH-Compositional over a few rival algorithms

    Stochastic Composition Optimization of Functions without Lipschitz Continuous Gradient

    Full text link
    In this paper, we study the stochastic optimization of two-level composition of functions without Lipschitz continuous gradient. The smoothness property is generalized by the notion of relative smoothness which provokes the Bregman gradient method. We propose three Stochastic Compositional Bregman Gradient algorithms for the three possible nonsmooth compositional scenarios and provide their sample complexities to achieve an ϵ\epsilon-approximate stationary point. For the smooth of relative smooth composition, the first algorithm requires O(ϵ−2)O(\epsilon^{-2}) calls to the stochastic oracles of the inner function value and gradient as well as the outer function gradient. When both functions are relatively smooth, the second algorithm requires O(ϵ−3)O(\epsilon^{-3}) calls to the inner function stochastic oracle and O(ϵ−2)O(\epsilon^{-2}) calls to the inner and outer function stochastic gradient oracles. We further improve the second algorithm by variance reduction for the setting where just the inner function is smooth. The resulting algorithm requires O(ϵ−5/2)O(\epsilon^{-5/2}) calls to the stochastic inner function value and O(ϵ−3/2)O(\epsilon^{-3/2}) calls to the inner stochastic gradient and O(ϵ−2)O(\epsilon^{-2}) calls to the outer function stochastic gradient. Finally, we numerically evaluate the performance of these algorithms over two examples

    Riemannian Stochastic Gradient Method for Nested Composition Optimization

    Full text link
    This work considers optimization of composition of functions in a nested form over Riemannian manifolds where each function contains an expectation. This type of problems is gaining popularity in applications such as policy evaluation in reinforcement learning or model customization in meta-learning. The standard Riemannian stochastic gradient methods for non-compositional optimization cannot be directly applied as stochastic approximation of inner functions create bias in the gradients of the outer functions. For two-level composition optimization, we present a Riemannian Stochastic Composition Gradient Descent (R-SCGD) method that finds an approximate stationary point, with expected squared Riemannian gradient smaller than ϵ\epsilon, in O(ϵ−2)O(\epsilon^{-2}) calls to the stochastic gradient oracle of the outer function and stochastic function and gradient oracles of the inner function. Furthermore, we generalize the R-SCGD algorithms for problems with multi-level nested compositional structures, with the same complexity of O(ϵ−2)O(\epsilon^{-2}) for the first-order stochastic oracle. Finally, the performance of the R-SCGD method is numerically evaluated over a policy evaluation problem in reinforcement learning

    Stochastic Constrained DRO with a Complexity Independent of Sample Size

    Full text link
    Distributionally Robust Optimization (DRO), as a popular method to train robust models against distribution shift between training and test sets, has received tremendous attention in recent years. In this paper, we propose and analyze stochastic algorithms that apply to both non-convex and convex losses for solving Kullback Leibler divergence constrained DRO problem. Compared with existing methods solving this problem, our stochastic algorithms not only enjoy competitive if not better complexity independent of sample size but also just require a constant batch size at every iteration, which is more practical for broad applications. We establish a nearly optimal complexity bound for finding an ϵ\epsilon stationary solution for non-convex losses and an optimal complexity for finding an ϵ\epsilon optimal solution for convex losses. Empirical studies demonstrate the effectiveness of the proposed algorithms for solving non-convex and convex constrained DRO problems.Comment: 37 pages, 16 figure

    Multi-block Min-max Bilevel Optimization with Applications in Multi-task Deep AUC Maximization

    Full text link
    In this paper, we study multi-block min-max bilevel optimization problems, where the upper level is non-convex strongly-concave minimax objective and the lower level is a strongly convex objective, and there are multiple blocks of dual variables and lower level problems. Due to the intertwined multi-block min-max bilevel structure, the computational cost at each iteration could be prohibitively high, especially with a large number of blocks. To tackle this challenge, we present a single-loop randomized stochastic algorithm, which requires updates for only a constant number of blocks at each iteration. Under some mild assumptions on the problem, we establish its sample complexity of O(1/ϵ4)O(1/\epsilon^4) for finding an ϵ\epsilon-stationary point. This matches the optimal complexity for solving stochastic nonconvex optimization under a general unbiased stochastic oracle model. Moreover, we provide two applications of the proposed method in multi-task deep AUC (area under ROC curve) maximization and multi-task deep partial AUC maximization. Experimental results validate our theory and demonstrate the effectiveness of our method on problems with hundreds of tasks
    • …
    corecore