Search CORE

285 research outputs found

The Extended Regularized Dual Averaging Method for Composite Optimization

Author: Siegel Jonathan W.
Xu Jinchao
Publication venue
Publication date: 10/03/2021
Field of study

We present a new algorithm, extended regularized dual averaging (XRDA), for solving composite optimization problems, which are a generalization of the regularized dual averaging (RDA) method. The main novelty of the method is that it allows more flexible control of the backward step size. For instance, the backward step size for RDA grows without bound, while XRDA the backward step size can be kept bounded

arXiv.org e-Print Archive

Optimal Approximation of Zonoids and Uniform Approximation by Shallow Neural Networks

Author: Siegel Jonathan W.
Publication venue
Publication date: 27/07/2023
Field of study

We study the following two related problems. The first is to determine to what error an arbitrary zonoid in

\mathbb{R}^{d+1}

can be approximated in the Hausdorff distance by a sum of

n

line segments. The second is to determine optimal approximation rates in the uniform norm for shallow ReLU

^k

neural networks on their variation spaces. The first of these problems has been solved for

d\neq 2,3

, but when

d=2,3

a logarithmic gap between the best upper and lower bounds remains. We close this gap, which completes the solution in all dimensions. For the second problem, our techniques significantly improve upon existing approximation rates when

k\geq 1

, and enable uniform approximation of both the target function and its derivatives

arXiv.org e-Print Archive

A qualitative difference between gradient flows of convex functions in finite- and infinite-dimensional Hilbert spaces

Author: Siegel Jonathan W.
Wojtowytsch Stephan
Publication venue
Publication date: 26/10/2023
Field of study

We consider gradient flow/gradient descent and heavy ball/accelerated gradient descent optimization for convex objective functions. In the gradient flow case, we prove the following: 1. If

f

does not have a minimizer, the convergence

f(x_t)\to \inf f

can be arbitrarily slow. 2. If

f

does have a minimizer, the excess energy

f(x_t) - \inf f

is integrable/summable in time. In particular,

f(x_t) - \inf f = o(1/t)

t\to\infty

. 3. In Hilbert spaces, this is optimal:

f(x_t) - \inf f

can decay to

0

as slowly as any given function which is monotone decreasing and integrable at

\infty

, even for a fixed quadratic objective. 4. In finite dimension (or more generally, for all gradient flow curves of finite length), this is not optimal: We prove that there are convex monotone decreasing integrable functions

g(t)

which decrease to zero slower than

f(x_t)-\inf f

for the gradient flow of any convex function on

\mathbb R^d

. For instance, we show that any gradient flow

x_t

of a convex function

f

in finite dimension satisfies

\liminf_{t\to\infty} \big(t\cdot \log^2(t)\cdot \big\{f(x_t) -\inf f\big\}\big)=0

. This improves on the commonly reported

O(1/t)

rate and provides a sharp characterization of the energy decay law. We also note that it is impossible to establish a rate

O(1/(t\phi(t))

for any function

\phi

which satisfies

\lim_{t\to\infty}\phi(t) = \infty

, even asymptotically. Similar results are obtained in related settings for (1) discrete time gradient descent, (2) stochastic gradient descent with multiplicative noise and (3) the heavy ball ODE. In the case of stochastic gradient descent, the summability of

\mathbb E[f(x_n) - \inf f]

is used to prove that

f(x_n)\to \inf f

almost surely - an improvement on the convergence almost surely up to a subsequence which follows from the

O(1/n)

decay estimate

arXiv.org e-Print Archive

Sharp Convergence Rates for Matching Pursuit

Author: Klusowski Jason M.
Siegel Jonathan W.
Publication venue
Publication date: 25/07/2023
Field of study

We study the fundamental limits of matching pursuit, or the pure greedy algorithm, for approximating a target function by a sparse linear combination of elements from a dictionary. When the target function is contained in the variation space corresponding to the dictionary, many impressive works over the past few decades have obtained upper and lower bounds on the error of matching pursuit, but they do not match. The main contribution of this paper is to close this gap and obtain a sharp characterization of the decay rate of matching pursuit. Specifically, we construct a worst case dictionary which shows that the existing best upper bound cannot be significantly improved. It turns out that, unlike other greedy algorithm variants, the converge rate is suboptimal and is determined by the solution to a certain non-linear equation. This enables us to conclude that any amount of shrinkage improves matching pursuit in the worst case

arXiv.org e-Print Archive

A Priori Analysis of Stable Neural Network Solutions to Numerical PDEs

Author: Hong Qingguo
Siegel Jonathan W.
Xu Jinchao
Publication venue
Publication date: 13/07/2022
Field of study

Methods for solving PDEs using neural networks have recently become a very important topic. We provide an a priori error analysis for such methods which is based on the

\mathcal{K}_1(\mathbb{D})

-norm of the solution. We show that the resulting constrained optimization problem can be efficiently solved using a greedy algorithm, which replaces stochastic gradient descent. Following this, we show that the error arising from discretizing the energy integrals is bounded both in the deterministic case, i.e. when using numerical quadrature, and also in the stochastic case, i.e. when sampling points to approximate the integrals. In the later case, we use a Rademacher complexity analysis, and in the former we use standard numerical quadrature bounds. This extends existing results to methods which use a general dictionary of functions to learn solutions to PDEs and importantly gives a consistent analysis which incorporates the optimization, approximation, and generalization aspects of the problem. In addition, the Rademacher complexity analysis is simplified and generalized, which enables application to a wide range of problems.Comment: This paper has been merged with arXiv:2107.0446

arXiv.org e-Print Archive