13,964 research outputs found
The Statistical Complexity of Early-Stopped Mirror Descent
Recently there has been a surge of interest in understanding implicit
regularization properties of iterative gradient-based optimization algorithms.
In this paper, we study the statistical guarantees on the excess risk achieved
by early-stopped unconstrained mirror descent algorithms applied to the
unregularized empirical risk with the squared loss for linear models and kernel
methods. By completing an inequality that characterizes convexity for the
squared loss, we identify an intrinsic link between offset Rademacher
complexities and potential-based convergence analysis of mirror descent
methods. Our observation immediately yields excess risk guarantees for the path
traced by the iterates of mirror descent in terms of offset complexities of
certain function classes depending only on the choice of the mirror map,
initialization point, step-size, and the number of iterations. We apply our
theory to recover, in a clean and elegant manner via rather short proofs, some
of the recent results in the implicit regularization literature, while also
showing how to improve upon them in some settings
Sobolev Acceleration and Statistical Optimality for Learning Elliptic Equations via Gradient Descent
In this paper, we study the statistical limits in terms of Sobolev norms of
gradient descent for solving inverse problem from randomly sampled noisy
observations using a general class of objective functions. Our class of
objective functions includes Sobolev training for kernel regression, Deep Ritz
Methods (DRM), and Physics Informed Neural Networks (PINN) for solving elliptic
partial differential equations (PDEs) as special cases. We consider a
potentially infinite-dimensional parameterization of our model using a suitable
Reproducing Kernel Hilbert Space and a continuous parameterization of problem
hardness through the definition of kernel integral operators. We prove that
gradient descent over this objective function can also achieve statistical
optimality and the optimal number of passes over the data increases with sample
size. Based on our theory, we explain an implicit acceleration of using a
Sobolev norm as the objective function for training, inferring that the optimal
number of epochs of DRM becomes larger than the number of PINN when both the
data size and the hardness of tasks increase, although both DRM and PINN can
achieve statistical optimality
Subsampling Algorithms for Semidefinite Programming
We derive a stochastic gradient algorithm for semidefinite optimization using
randomization techniques. The algorithm uses subsampling to reduce the
computational cost of each iteration and the subsampling ratio explicitly
controls granularity, i.e. the tradeoff between cost per iteration and total
number of iterations. Furthermore, the total computational cost is directly
proportional to the complexity (i.e. rank) of the solution. We study numerical
performance on some large-scale problems arising in statistical learning.Comment: Final version, to appear in Stochastic System
CoCoA: A General Framework for Communication-Efficient Distributed Optimization
The scale of modern datasets necessitates the development of efficient
distributed optimization methods for machine learning. We present a
general-purpose framework for distributed computing environments, CoCoA, that
has an efficient communication scheme and is applicable to a wide variety of
problems in machine learning and signal processing. We extend the framework to
cover general non-strongly-convex regularizers, including L1-regularized
problems like lasso, sparse logistic regression, and elastic net
regularization, and show how earlier work can be derived as a special case. We
provide convergence guarantees for the class of convex regularized loss
minimization objectives, leveraging a novel approach in handling
non-strongly-convex regularizers and non-smooth loss functions. The resulting
framework has markedly improved performance over state-of-the-art methods, as
we illustrate with an extensive set of experiments on real distributed
datasets
Convex Optimization: Algorithms and Complexity
This monograph presents the main complexity theorems in convex optimization
and their corresponding algorithms. Starting from the fundamental theory of
black-box optimization, the material progresses towards recent advances in
structural optimization and stochastic optimization. Our presentation of
black-box optimization, strongly influenced by Nesterov's seminal book and
Nemirovski's lecture notes, includes the analysis of cutting plane methods, as
well as (accelerated) gradient descent schemes. We also pay special attention
to non-Euclidean settings (relevant algorithms include Frank-Wolfe, mirror
descent, and dual averaging) and discuss their relevance in machine learning.
We provide a gentle introduction to structural optimization with FISTA (to
optimize a sum of a smooth and a simple non-smooth term), saddle-point mirror
prox (Nemirovski's alternative to Nesterov's smoothing), and a concise
description of interior point methods. In stochastic optimization we discuss
stochastic gradient descent, mini-batches, random coordinate descent, and
sublinear algorithms. We also briefly touch upon convex relaxation of
combinatorial problems and the use of randomness to round solutions, as well as
random walks based methods.Comment: A previous version of the manuscript was titled "Theory of Convex
Optimization for Machine Learning
- …