1,744 research outputs found

    Stochastic Nested Compositional Bi-level Optimization for Robust Feature Learning

    Full text link
    We develop and analyze stochastic approximation algorithms for solving nested compositional bi-level optimization problems. These problems involve a nested composition of TT potentially non-convex smooth functions in the upper-level, and a smooth and strongly convex function in the lower-level. Our proposed algorithm does not rely on matrix inversions or mini-batches and can achieve an ϵ\epsilon-stationary solution with an oracle complexity of approximately O~T(1/ϵ2)\tilde{O}_T(1/\epsilon^{2}), assuming the availability of stochastic first-order oracles for the individual functions in the composition and the lower-level, which are unbiased and have bounded moments. Here, O~T\tilde{O}_T hides polylog factors and constants that depend on TT. The key challenge we address in establishing this result relates to handling three distinct sources of bias in the stochastic gradients. The first source arises from the compositional nature of the upper-level, the second stems from the bi-level structure, and the third emerges due to the utilization of Neumann series approximations to avoid matrix inversion. To demonstrate the effectiveness of our approach, we apply it to the problem of robust feature learning for deep neural networks under covariate shift, showcasing the benefits and advantages of our methodology in that context

    Stochastic Multi-Level Compositional Optimization Algorithms over Networks with Level-Independent Convergence Rate

    Full text link
    Stochastic multi-level compositional optimization problems cover many new machine learning paradigms, e.g., multi-step model-agnostic meta-learning, which require efficient optimization algorithms for large-scale applications. This paper studies the decentralized stochastic multi-level optimization algorithm, which is challenging because the multi-level structure and decentralized communication scheme may make the number of levels affect the order of the convergence rate. To this end, we develop two novel decentralized optimization algorithms to deal with the multi-level function and its gradient. Our theoretical results show that both algorithms can achieve the level-independent convergence rate for nonconvex problems under much milder conditions compared with existing single-machine algorithms. To the best of our knowledge, this is the first work that achieves the level-independent convergence rate under the decentralized setting. Moreover, extensive experiments confirm the efficacy of our proposed algorithms

    Efficient Smooth Non-Convex Stochastic Compositional Optimization via Stochastic Recursive Gradient Descent

    Get PDF
    Stochastic compositional optimization arises in many important machine learning applications. The objective function is the composition of two expectations of stochastic functions, and is more challenging to optimize than vanilla stochastic optimization problems. In this paper, we investigate the stochastic compositional optimization in the general smooth non-convex setting. We employ a recently developed idea of Stochastic Recursive Gradient Descent to design a novel algorithm named SARAH-Compositional, and prove a sharp Incremental First-order Oracle (IFO) complexity upper bound for stochastic compositional optimization: ((n + m)1/2ε-2) in the finite-sum case and (ε-3) in the online case. Such a complexity is known to be the best one among IFO complexity results for non-convex stochastic compositional optimization. Numerical experiments on risk-adverse portfolio management validate the superiority of SARAH-Compositional over a few rival algorithms

    Algorithmic Foundations of Empirical X-risk Minimization

    Full text link
    This manuscript introduces a new optimization framework for machine learning and AI, named {\bf empirical X-risk minimization (EXM)}. X-risk is a term introduced to represent a family of compositional measures or objectives, in which each data point is compared with a large number of items explicitly or implicitly for defining a risk function. It includes surrogate objectives of many widely used measures and non-decomposable losses, e.g., AUROC, AUPRC, partial AUROC, NDCG, MAP, precision/recall at top KK positions, precision at a certain recall level, listwise losses, p-norm push, top push, global contrastive losses, etc. While these non-decomposable objectives and their optimization algorithms have been studied in the literature of machine learning, computer vision, information retrieval, and etc, optimizing these objectives has encountered some unique challenges for deep learning. In this paper, we present recent rigorous efforts for EXM with a focus on its algorithmic foundations and its applications. We introduce a class of algorithmic techniques for solving EXM with smooth non-convex objectives. We formulate EXM into three special families of non-convex optimization problems belonging to non-convex compositional optimization, non-convex min-max optimization and non-convex bilevel optimization, respectively. For each family of problems, we present some strong baseline algorithms and their complexities, which will motivate further research for improving the existing results. Discussions about the presented results and future studies are given at the end. Efficient algorithms for optimizing a variety of X-risks are implemented in the LibAUC library at \url{www.libauc.org}

    Federated Multi-Level Optimization over Decentralized Networks

    Full text link
    Multi-level optimization has gained increasing attention in recent years, as it provides a powerful framework for solving complex optimization problems that arise in many fields, such as meta-learning, multi-player games, reinforcement learning, and nested composition optimization. In this paper, we study the problem of distributed multi-level optimization over a network, where agents can only communicate with their immediate neighbors. This setting is motivated by the need for distributed optimization in large-scale systems, where centralized optimization may not be practical or feasible. To address this problem, we propose a novel gossip-based distributed multi-level optimization algorithm that enables networked agents to solve optimization problems at different levels in a single timescale and share information through network propagation. Our algorithm achieves optimal sample complexity, scaling linearly with the network size, and demonstrates state-of-the-art performance on various applications, including hyper-parameter tuning, decentralized reinforcement learning, and risk-averse optimization.Comment: arXiv admin note: substantial text overlap with arXiv:2206.1087
    • …
    corecore