4 research outputs found

    Optimal Diagonal Preconditioning: Theory and Practice

    Full text link
    Preconditioning has been a staple technique in optimization and machine learning. It often reduces the condition number of the matrix it is applied to, thereby speeding up convergence of optimization algorithms. Although there are many popular preconditioning techniques in practice, most lack theoretical guarantees for reductions in condition number. In this paper, we study the problem of optimal diagonal preconditioning to achieve maximal reduction in the condition number of any full-rank matrix by scaling its rows or columns separately or simultaneously. We first reformulate the problem as a quasi-convex problem and provide a baseline bisection algorithm that is easy to implement in practice, where each iteration consists of an SDP feasibility problem. Then we propose a polynomial time potential reduction algorithm with O(log(1ϵ))O(\log(\frac{1}{\epsilon})) iteration complexity, where each iteration consists of a Newton update based on the Nesterov-Todd direction. Our algorithm is based on a formulation of the problem which is a generalized version of the Von Neumann optimal growth problem. Next, we specialize to one-sided optimal diagonal preconditioning problems, and demonstrate that they can be formulated as standard dual SDP problems, to which we apply efficient customized solvers and study the empirical performance of our optimal diagonal preconditioners. Our extensive experiments on large matrices demonstrate the practical appeal of optimal diagonal preconditioners at reducing condition numbers compared to heuristics-based preconditioners.Comment: this work originally appeared as arXiv:2003.07545v2, which was submitted as a replacement by acciden

    Diagonal Preconditioning: Theory and Algorithms

    Full text link
    Diagonal preconditioning has been a staple technique in optimization and machine learning. It often reduces the condition number of the design or Hessian matrix it is applied to, thereby speeding up convergence. However, rigorous analyses of how well various diagonal preconditioning procedures improve the condition number of the preconditioned matrix and how that translates into improvements in optimization are rare. In this paper, we first provide an analysis of a popular diagonal preconditioning technique based on column standard deviation and its effect on the condition number using random matrix theory. Then we identify a class of design matrices whose condition numbers can be reduced significantly by this procedure. We then study the problem of optimal diagonal preconditioning to improve the condition number of any full-rank matrix and provide a bisection algorithm and a potential reduction algorithm with O(log(1ϵ))O(\log(\frac{1}{\epsilon})) iteration complexity, where each iteration consists of an SDP feasibility problem and a Newton update using the Nesterov-Todd direction, respectively. Finally, we extend the optimal diagonal preconditioning algorithm to an adaptive setting and compare its empirical performance at reducing the condition number and speeding up convergence for regression and classification problems with that of another adaptive preconditioning technique, namely batch normalization, that is essential in training machine learning models.Comment: Under review, previous version wrong draf

    Studies integrating geometry, probability, and optimization under convexity

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2006.Includes bibliographical references (p. 197-202).Convexity has played a major role in a variety of fields over the past decades. Nevertheless, the convexity assumption continues to reveal new theoretical paradigms and applications. This dissertation explores convexity in the intersection of three fields, namely, geometry, probability, and optimization. We study in depth a variety of geometric quantities. These quantities are used to describe the behavior of different algorithms. In addition, we investigate how to algorithmically manipulate these geometric quantities. This leads to algorithms capable of transforming ill-behaved instances into well-behaved ones. In particular, we provide probabilistic methods that carry out such task efficiently by exploiting the geometry of the problem. More specific contributions of this dissertation are as follows. (i) We conduct a broad exploration of the symmetry function of convex sets and propose efficient methods for its computation in the polyhedral case. (ii) We also relate the symmetry function with the computational complexity of an interior-point method to solve a homogeneous conic system. (iii) Moreover, we develop a family of pre-conditioners based on the symmetry function and projective transformations for such interior-point method.(cont.) The implementation of the pre-conditioners relies on geometric random walks. (iv) We developed the analysis of the re-scaled perceptron algorithm for a linear conic system. In this method a sequence of linear transformations is used to increase a condition measure associated with the problem. (v) Finally, we establish properties relating a probability density induced by an arbitrary norm and the geometry of its support. This is used to construct an efficient simulating annealing algorithm to test whether a convex set is bounded, where the set is represented only by a membership oracle.by Alexandre Belloni Nogueira.Ph.D
    corecore