Search CORE

29 research outputs found

Concentration of the Langevin Algorithm's Stationary Distribution

Author: Altschuler Jason M.
Talwar Kunal
Publication venue
Publication date: 23/12/2022
Field of study

A canonical algorithm for log-concave sampling is the Langevin Algorithm, aka the Langevin Diffusion run with some discretization stepsize

\eta > 0

. This discretization leads the Langevin Algorithm to have a stationary distribution

\pi_{\eta}

which differs from the stationary distribution

\pi

of the Langevin Diffusion, and it is an important challenge to understand whether the well-known properties of

\pi

extend to

\pi_{\eta}

. In particular, while concentration properties such as isoperimetry and rapidly decaying tails are classically known for

\pi

, the analogous properties for

\pi_{\eta}

are open questions with direct algorithmic implications. This note provides a first step in this direction by establishing concentration results for

\pi_{\eta}

that mirror classical results for

\pi

. Specifically, we show that for any nontrivial stepsize

\eta > 0

\pi_{\eta}

is sub-exponential (respectively, sub-Gaussian) when the potential is convex (respectively, strongly convex). Moreover, the concentration bounds we show are essentially tight. Key to our analysis is the use of a rotation-invariant moment generating function (aka Bessel function) to study the stationary dynamics of the Langevin Algorithm. This technique may be of independent interest because it enables directly analyzing the discrete-time stationary distribution

\pi_{\eta}

without going through the continuous-time stationary distribution

\pi

as an intermediary

arXiv.org e-Print Archive

Near-linear convergence of the Random Osborne algorithm for Matrix Balancing

Author: Altschuler Jason M.
Parrilo Pablo A.
Publication venue
Publication date: 02/07/2021
Field of study

We revisit Matrix Balancing, a pre-conditioning task used ubiquitously for computing eigenvalues and matrix exponentials. Since 1960, Osborne's algorithm has been the practitioners' algorithm of choice and is now implemented in most numerical software packages. However, its theoretical properties are not well understood. Here, we show that a simple random variant of Osborne's algorithm converges in near-linear time in the input sparsity. Specifically, it balances

K\in\mathbb{R}_{\geq 0}^{n\times n}

after

O(m\epsilon^{-2}\log\kappa)

arithmetic operations, where

m

is the number of nonzeros in

K

\epsilon

is the

\ell_1

accuracy, and

\kappa=\sum_{ij}K_{ij}/(\min_{ij:K_{ij}\neq 0}K_{ij})

measures the conditioning of

K

. Previous work had established near-linear runtimes either only for

\ell_2

accuracy (a weaker criterion which is less relevant for applications), or through an entirely different algorithm based on (currently) impractical Laplacian solvers. We further show that if the graph with adjacency matrix

K

is moderately connected--e.g., if

K

has at least one positive row/column pair--then Osborne's algorithm initially converges exponentially fast, yielding an improved runtime

O(m\epsilon^{-1}\log\kappa)

. We also address numerical precision by showing that these runtime bounds still hold when using

O(\log(n\kappa/\epsilon))

-bit numbers. Our results are established through an intuitive potential argument that leverages a convex optimization perspective of Osborne's algorithm, and relates the per-iteration progress to the current imbalance as measured in Hellinger distance. Unlike previous analyses, we critically exploit log-convexity of the potential. Our analysis extends to other variants of Osborne's algorithm: along the way, we establish significantly improved runtime bounds for cyclic, greedy, and parallelized variants.Comment: v2: Fixed minor typos. Modified title for clarity. Corrected statement of Thm 6.1; this does not affect our main result

arXiv.org e-Print Archive

DSpace@MIT

Acceleration by Stepsize Hedging I: Multi-Step Descent and the Silver Stepsize Schedule

Author: Altschuler Jason M.
Parrilo Pablo A.
Publication venue
Publication date: 14/09/2023
Field of study

Can we accelerate convergence of gradient descent without changing the algorithm -- just by carefully choosing stepsizes? Surprisingly, we show that the answer is yes. Our proposed Silver Stepsize Schedule optimizes strongly convex functions in

k^{\log_{\rho} 2} \approx k^{0.7864}

iterations, where

\rho=1+\sqrt{2}

is the silver ratio and

k

is the condition number. This is intermediate between the textbook unaccelerated rate

k

and the accelerated rate

\sqrt{k}

due to Nesterov in 1983. The non-strongly convex setting is conceptually identical, and standard black-box reductions imply an analogous accelerated rate

\varepsilon^{-\log_{\rho} 2} \approx \varepsilon^{-0.7864}

. We conjecture and provide partial evidence that these rates are optimal among all possible stepsize schedules. The Silver Stepsize Schedule is constructed recursively in a fully explicit way. It is non-monotonic, fractal-like, and approximately periodic of period

k^{\log_{\rho} 2}

. This leads to a phase transition in the convergence rate: initially super-exponential (acceleration regime), then exponential (saturation regime).Comment: 7 figure

arXiv.org e-Print Archive

Acceleration by Stepsize Hedging II: Silver Stepsize Schedule for Smooth Convex Optimization

Author: Altschuler Jason M.
Parrilo Pablo A.
Publication venue
Publication date: 28/09/2023
Field of study

We provide a concise, self-contained proof that the Silver Stepsize Schedule proposed in Part I directly applies to smooth (non-strongly) convex optimization. Specifically, we show that with these stepsizes, gradient descent computes an

\epsilon

-minimizer in

O(\epsilon^{-\log_{\rho} 2}) = O(\epsilon^{-0.7864})

iterations, where

\rho = 1+\sqrt{2}

is the silver ratio. This is intermediate between the textbook unaccelerated rate

O(\epsilon^{-1})

and the accelerated rate

O(\epsilon^{-1/2})

due to Nesterov in 1983. The Silver Stepsize Schedule is a simple explicit fractal: the

i

-th stepsize is

1+\rho^{v(i)-1}

where

v(i)

is the

2

-adic valuation of

i

. The design and analysis are conceptually identical to the strongly convex setting in Part I, but simplify remarkably in this specific setting.Comment: 10 pages, 3 figure

arXiv.org e-Print Archive

Averaging on the Bures-Wasserstein manifold: dimension-free convergence of gradient descent

Author: Altschuler Jason M.
Chewi Sinho
Gerber Patrik
Stromme Austin J.
Publication venue
Publication date: 15/06/2021
Field of study

We study first-order optimization algorithms for computing the barycenter of Gaussian distributions with respect to the optimal transport metric. Although the objective is geodesically non-convex, Riemannian GD empirically converges rapidly, in fact faster than off-the-shelf methods such as Euclidean GD and SDP solvers. This stands in stark contrast to the best-known theoretical results for Riemannian GD, which depend exponentially on the dimension. In this work, we prove new geodesic convexity results which provide stronger control of the iterates, yielding a dimension-free convergence rate. Our techniques also enable the analysis of two related notions of averaging, the entropically-regularized barycenter and the geometric median, providing the first convergence guarantees for Riemannian GD for these problems.Comment: 48 pages, 8 figure

arXiv.org e-Print Archive

Development and implementation of a prescription opioid registry across diverse health systems

Author: Ahmedani Brian
Altschuler Andrea
Andrade Susan E
Bailey Steffani R
Binswanger Ingrid
Boscarino Joseph A
Campbell Cynthia I
Clark Robin E
Clarke Christina L
Glanz Jason M
Haller Irina V
Hechter Rulin
Karmali Ruchir
McCarty Dennis
Ray G. Thomas
Roblin Douglas W
Rosa Carmen L
Rubinstein Andrea L
Sanchez Katherine
Stephens Kari A
Yarborough Bobbi Jo
Publication venue: Henry Ford Health Scholarly Commons
Publication date: 17/05/2022
Field of study

Objective: Develop and implement a prescription opioid registry in 10 diverse health systems across the US and describe trends in prescribed opioids between 2012 and 2018. Materials and Methods: Using electronic health record and claims data, we identified patients who had an outpatient fill for any prescription opioid, and/or an opioid use disorder diagnosis, between January 1, 2012 and December 31, 2018. The registry contains distributed files of prescription opioids, benzodiazepines and other select medications, opioid antagonists, clinical diagnoses, procedures, health services utilization, and health plan membership. Rates of outpatient opioid fills over the study period, standardized to health system demographic distributions, are described by age, gender, and race/ethnicity among members without cancer. Results: The registry includes 6 249 710 patients and over 40 million outpatient opioid fills. For the combined registry population, opioid fills declined from a high of 0.718 per member-year in 2013 to 0.478 in 2018, and morphine milligram equivalents (MMEs) per fill declined from 985 MMEs per fill in 2012 to 758 MMEs in 2018. MMEs per member declined from 692 MMEs per member in 2012 to 362 MMEs per member in 2018. Conclusion: This study established a population-based opioid registry across 10 diverse health systems that can be used to address questions related to opioid use. Initial analyses showed large reductions in overall opioid use per member among the combined health systems. The registry will be used in future studies to answer a broad range of other critical public health issues relating to prescription opioid use

Henry Ford Health System Scholarly Commons

PubMed Central