    Structured Semidefinite Programming for Recovering Structured Preconditioners

    We develop a general framework for finding approximately-optimal preconditioners for solving linear systems. Leveraging this framework we obtain improved runtimes for fundamental preconditioning and linear system solving problems including the following. We give an algorithm which, given positive definite KRd×d\mathbf{K} \in \mathbb{R}^{d \times d} with nnz(K)\mathrm{nnz}(\mathbf{K}) nonzero entries, computes an ϵ\epsilon-optimal diagonal preconditioner in time O~(nnz(K)poly(κ,ϵ1))\widetilde{O}(\mathrm{nnz}(\mathbf{K}) \cdot \mathrm{poly}(\kappa^\star,\epsilon^{-1})), where κ\kappa^\star is the optimal condition number of the rescaled matrix. We give an algorithm which, given MRd×d\mathbf{M} \in \mathbb{R}^{d \times d} that is either the pseudoinverse of a graph Laplacian matrix or a constant spectral approximation of one, solves linear systems in M\mathbf{M} in O~(d2)\widetilde{O}(d^2) time. Our diagonal preconditioning results improve state-of-the-art runtimes of Ω(d3.5)\Omega(d^{3.5}) attained by general-purpose semidefinite programming, and our solvers improve state-of-the-art runtimes of Ω(dω)\Omega(d^{\omega}) where ω>2.3\omega > 2.3 is the current matrix multiplication constant. We attain our results via new algorithms for a class of semidefinite programs (SDPs) we call matrix-dictionary approximation SDPs, which we leverage to solve an associated problem we call matrix-dictionary recovery.Comment: Merge of arXiv:1812.06295 and arXiv:2008.0172

    Dual Space Preconditioning for Gradient Descent

    The conditions of relative smoothness and relative strong convexity were recently introduced for the analysis of Bregman gradient methods for convex optimization. We introduce a generalized left-preconditioning method for gradient descent, and show that its convergence on an essentially smooth convex objective function can be guaranteed via an application of relative smoothness in the dual space. Our relative smoothness assumption is between the designed preconditioner and the convex conjugate of the objective, and it generalizes the typical Lipschitz gradient assumption. Under dual relative strong convexity, we obtain linear convergence with a generalized condition number that is invariant under horizontal translations, distinguishing it from Bregman gradient methods. Thus, in principle our method is capable of improving the conditioning of gradient descent on problems with non-Lipschitz gradient or non-strongly convex structure. We demonstrate our method on p-norm regression and exponential penalty function minimization.Comment: SIAM J. Optim, accepte

    High-Dimensional Geometric Streaming in Polynomial Space

    Many existing algorithms for streaming geometric data analysis have been plagued by exponential dependencies in the space complexity, which are undesirable for processing high-dimensional data sets. In particular, once dlognd\geq\log n, there are no known non-trivial streaming algorithms for problems such as maintaining convex hulls and L\"owner-John ellipsoids of nn points, despite a long line of work in streaming computational geometry since [AHV04]. We simultaneously improve these results to poly(d,logn)\mathrm{poly}(d,\log n) bits of space by trading off with a poly(d,logn)\mathrm{poly}(d,\log n) factor distortion. We achieve these results in a unified manner, by designing the first streaming algorithm for maintaining a coreset for \ell_\infty subspace embeddings with poly(d,logn)\mathrm{poly}(d,\log n) space and poly(d,logn)\mathrm{poly}(d,\log n) distortion. Our algorithm also gives similar guarantees in the \emph{online coreset} model. Along the way, we sharpen results for online numerical linear algebra by replacing a log condition number dependence with a logn\log n dependence, answering a question of [BDM+20]. Our techniques provide a novel connection between leverage scores, a fundamental object in numerical linear algebra, and computational geometry. For p\ell_p subspace embeddings, we give nearly optimal trade-offs between space and distortion for one-pass streaming algorithms. For instance, we give a deterministic coreset using O(d2logn)O(d^2\log n) space and O((dlogn)1/21/p)O((d\log n)^{1/2-1/p}) distortion for p>2p>2, whereas previous deterministic algorithms incurred a poly(n)\mathrm{poly}(n) factor in the space or the distortion [CDW18]. Our techniques have implications in the offline setting, where we give optimal trade-offs between the space complexity and distortion of subspace sketch data structures. To do this, we give an elementary proof of a "change of density" theorem of [LT80] and make it algorithmic.Comment: Abstract shortened to meet arXiv limits; v2 fix statements concerning online condition numbe