4 research outputs found
Optimal Diagonal Preconditioning: Theory and Practice
Preconditioning has been a staple technique in optimization and machine
learning. It often reduces the condition number of the matrix it is applied to,
thereby speeding up convergence of optimization algorithms. Although there are
many popular preconditioning techniques in practice, most lack theoretical
guarantees for reductions in condition number. In this paper, we study the
problem of optimal diagonal preconditioning to achieve maximal reduction in the
condition number of any full-rank matrix by scaling its rows or columns
separately or simultaneously. We first reformulate the problem as a
quasi-convex problem and provide a baseline bisection algorithm that is easy to
implement in practice, where each iteration consists of an SDP feasibility
problem. Then we propose a polynomial time potential reduction algorithm with
iteration complexity, where each iteration
consists of a Newton update based on the Nesterov-Todd direction. Our algorithm
is based on a formulation of the problem which is a generalized version of the
Von Neumann optimal growth problem. Next, we specialize to one-sided optimal
diagonal preconditioning problems, and demonstrate that they can be formulated
as standard dual SDP problems, to which we apply efficient customized solvers
and study the empirical performance of our optimal diagonal preconditioners.
Our extensive experiments on large matrices demonstrate the practical appeal of
optimal diagonal preconditioners at reducing condition numbers compared to
heuristics-based preconditioners.Comment: this work originally appeared as arXiv:2003.07545v2, which was
submitted as a replacement by acciden
Diagonal Preconditioning: Theory and Algorithms
Diagonal preconditioning has been a staple technique in optimization and
machine learning. It often reduces the condition number of the design or
Hessian matrix it is applied to, thereby speeding up convergence. However,
rigorous analyses of how well various diagonal preconditioning procedures
improve the condition number of the preconditioned matrix and how that
translates into improvements in optimization are rare. In this paper, we first
provide an analysis of a popular diagonal preconditioning technique based on
column standard deviation and its effect on the condition number using random
matrix theory. Then we identify a class of design matrices whose condition
numbers can be reduced significantly by this procedure. We then study the
problem of optimal diagonal preconditioning to improve the condition number of
any full-rank matrix and provide a bisection algorithm and a potential
reduction algorithm with iteration complexity,
where each iteration consists of an SDP feasibility problem and a Newton update
using the Nesterov-Todd direction, respectively. Finally, we extend the optimal
diagonal preconditioning algorithm to an adaptive setting and compare its
empirical performance at reducing the condition number and speeding up
convergence for regression and classification problems with that of another
adaptive preconditioning technique, namely batch normalization, that is
essential in training machine learning models.Comment: Under review, previous version wrong draf
Studies integrating geometry, probability, and optimization under convexity
Thesis (Ph. D.)--Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2006.Includes bibliographical references (p. 197-202).Convexity has played a major role in a variety of fields over the past decades. Nevertheless, the convexity assumption continues to reveal new theoretical paradigms and applications. This dissertation explores convexity in the intersection of three fields, namely, geometry, probability, and optimization. We study in depth a variety of geometric quantities. These quantities are used to describe the behavior of different algorithms. In addition, we investigate how to algorithmically manipulate these geometric quantities. This leads to algorithms capable of transforming ill-behaved instances into well-behaved ones. In particular, we provide probabilistic methods that carry out such task efficiently by exploiting the geometry of the problem. More specific contributions of this dissertation are as follows. (i) We conduct a broad exploration of the symmetry function of convex sets and propose efficient methods for its computation in the polyhedral case. (ii) We also relate the symmetry function with the computational complexity of an interior-point method to solve a homogeneous conic system. (iii) Moreover, we develop a family of pre-conditioners based on the symmetry function and projective transformations for such interior-point method.(cont.) The implementation of the pre-conditioners relies on geometric random walks. (iv) We developed the analysis of the re-scaled perceptron algorithm for a linear conic system. In this method a sequence of linear transformations is used to increase a condition measure associated with the problem. (v) Finally, we establish properties relating a probability density induced by an arbitrary norm and the geometry of its support. This is used to construct an efficient simulating annealing algorithm to test whether a convex set is bounded, where the set is represented only by a membership oracle.by Alexandre Belloni Nogueira.Ph.D