10 research outputs found

    Reusing Combinatorial Structure: Faster Iterative Projections over Submodular Base Polytopes

    Full text link
    Optimization algorithms such as projected Newton's method, FISTA, mirror descent and its variants enjoy near-optimal regret bounds and convergence rates, but suffer from a computational bottleneck of computing "projections'' in potentially each iteration (e.g., O(T1/2)O(T^{1/2}) regret of online mirror descent). On the other hand, conditional gradient variants solve a linear optimization in each iteration, but result in suboptimal rates (e.g., O(T3/4)O(T^{3/4}) regret of online Frank-Wolfe). Motivated by this trade-off in runtime v/s convergence rates, we consider iterative projections of close-by points over widely-prevalent submodular base polytopes B(f)B(f). We develop a toolkit to speed up the computation of projections using both discrete and continuous perspectives. We subsequently adapt the away-step Frank-Wolfe algorithm to use this information and enable early termination. For the special case of cardinality based submodular polytopes, we improve the runtime of computing certain Bregman projections by a factor of Ω(n/log(n))\Omega(n/\log(n)). Our theoretical results show orders of magnitude reduction in runtime in preliminary computational experiments

    Universal Algorithms: Beyond the Simplex

    Full text link
    The bulk of universal algorithms in the online convex optimisation literature are variants of the Hedge (exponential weights) algorithm on the simplex. While these algorithms extend to polytope domains by assigning weights to the vertices, this process is computationally unfeasible for many important classes of polytopes where the number VV of vertices depends exponentially on the dimension dd. In this paper we show the Subgradient algorithm is universal, meaning it has O(N)O(\sqrt N) regret in the antagonistic setting and O(1)O(1) pseudo-regret in the i.i.d setting, with two main advantages over Hedge: (1) The update step is more efficient as the action vectors have length only dd rather than VV; and (2) Subgradient gives better performance if the cost vectors satisfy Euclidean rather than sup-norm bounds. This paper extends the authors' recent results for Subgradient on the simplex. We also prove the same O(N)O(\sqrt N) and O(1)O(1) bounds when the domain is the unit ball. To the authors' knowledge this is the first instance of these bounds on a domain other than a polytope.Comment: 1 figure, 40 page

    Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective

    Full text link
    The top-k operator returns a sparse vector, where the non-zero values correspond to the k largest values of the input. Unfortunately, because it is a discontinuous function, it is difficult to incorporate in neural networks trained end-to-end with backpropagation. Recent works have considered differentiable relaxations, based either on regularization or perturbation techniques. However, to date, no approach is fully differentiable and sparse. In this paper, we propose new differentiable and sparse top-k operators. We view the top-k operator as a linear program over the permutahedron, the convex hull of permutations. We then introduce a p-norm regularization term to smooth out the operator, and show that its computation can be reduced to isotonic optimization. Our framework is significantly more general than the existing one and allows for example to express top-k operators that select values in magnitude. On the algorithmic side, in addition to pool adjacent violator (PAV) algorithms, we propose a new GPU/TPU-friendly Dykstra algorithm to solve isotonic optimization problems. We successfully use our operators to prune weights in neural networks, to fine-tune vision transformers, and as a router in sparse mixture of experts.Comment: ICML 2023 18 page

    Efficient and Modular Implicit Differentiation

    Full text link
    Automatic differentiation (autodiff) has revolutionized machine learning. It allows expressing complex computations by composing elementary ones in creative ways and removes the burden of computing their derivatives by hand. More recently, differentiation of optimization problem solutions has attracted widespread attention with applications such as optimization as a layer, and in bi-level problems such as hyper-parameter optimization and meta-learning. However, the formulas for these derivatives often involve case-by-case tedious mathematical derivations. In this paper, we propose a unified, efficient and modular approach for implicit differentiation of optimization problems. In our approach, the user defines (in Python in the case of our implementation) a function FF capturing the optimality conditions of the problem to be differentiated. Once this is done, we leverage autodiff of FF and implicit differentiation to automatically differentiate the optimization problem. Our approach thus combines the benefits of implicit differentiation and autodiff. It is efficient as it can be added on top of any state-of-the-art solver and modular as the optimality condition specification is decoupled from the implicit differentiation mechanism. We show that seemingly simple principles allow to recover many recently proposed implicit differentiation methods and create new ones easily. We demonstrate the ease of formulating and solving bi-level optimization problems using our framework. We also showcase an application to the sensitivity analysis of molecular dynamics.Comment: V2: some corrections and link to softwar

    Convex optimization methods for graphs and statistical modeling

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 209-220).An outstanding challenge in many problems throughout science and engineering is to succinctly characterize the relationships among a large number of interacting entities. Models based on graphs form one major thrust in this thesis, as graphs often provide a concise representation of the interactions among a large set of variables. A second major emphasis of this thesis are classes of structured models that satisfy certain algebraic constraints. The common theme underlying these approaches is the development of computational methods based on convex optimization, which are in turn useful in a broad array of problems in signal processing and machine learning. The specific contributions are as follows: -- We propose a convex optimization method for decomposing the sum of a sparse matrix and a low-rank matrix into the individual components. Based on new rank-sparsity uncertainty principles, we give conditions under which the convex program exactly recovers the underlying components. -- Building on the previous point, we describe a convex optimization approach to latent variable Gaussian graphical model selection. We provide theoretical guarantees of the statistical consistency of this convex program in the high-dimensional scaling regime in which the number of latent/observed variables grows with the number of samples of the observed variables. The algebraic varieties of sparse and low-rank matrices play a prominent role in this analysis. -- We present a general convex optimization formulation for linear inverse problems, in which we have limited measurements in the form of linear functionals of a signal or model of interest. When these underlying models have algebraic structure, the resulting convex programs can be solved exactly or approximately via semidefinite programming. We provide sharp estimates (based on computing certain Gaussian statistics related to the underlying model geometry) of the number of generic linear measurements required for exact and robust recovery in a variety of settings. -- We present convex graph invariants, which are invariants of a graph that are convex functions of the underlying adjacency matrix. Graph invariants characterize structural properties of a graph that do not depend on the labeling of the nodes; convex graph invariants constitute an important subclass, and they provide a systematic and unified computational framework based on convex optimization for solving a number of interesting graph problems. We emphasize a unified view of the underlying convex geometry common to these different frameworks. We describe applications of both these methods to problems in financial modeling and network analysis, and conclude with a discussion of directions for future research.by Venkat Chandrasekaran.Ph.D

    Stochastic Optimization for Spectral Risk Measures

    Full text link
    Spectral risk objectives - also called LL-risks - allow for learning systems to interpolate between optimizing average-case performance (as in empirical risk minimization) and worst-case performance on a task. We develop stochastic algorithms to optimize these quantities by characterizing their subdifferential and addressing challenges such as biasedness of subgradient estimates and non-smoothness of the objective. We show theoretically and experimentally that out-of-the-box approaches such as stochastic subgradient and dual averaging are hindered by bias and that our approach outperforms them
    corecore