32,056 research outputs found
Probabilistic Interpretation of Linear Solvers
This manuscript proposes a probabilistic framework for algorithms that
iteratively solve unconstrained linear problems with positive definite
for . The goal is to replace the point estimates returned by existing
methods with a Gaussian posterior belief over the elements of the inverse of
, which can be used to estimate errors. Recent probabilistic interpretations
of the secant family of quasi-Newton optimization algorithms are extended.
Combined with properties of the conjugate gradient algorithm, this leads to
uncertainty-calibrated methods with very limited cost overhead over conjugate
gradients, a self-contained novel interpretation of the quasi-Newton and
conjugate gradient algorithms, and a foundation for new nonlinear optimization
methods.Comment: final version, in press at SIAM J Optimizatio
Regression on fixed-rank positive semidefinite matrices: a Riemannian approach
The paper addresses the problem of learning a regression model parameterized
by a fixed-rank positive semidefinite matrix. The focus is on the nonlinear
nature of the search space and on scalability to high-dimensional problems. The
mathematical developments rely on the theory of gradient descent algorithms
adapted to the Riemannian geometry that underlies the set of fixed-rank
positive semidefinite matrices. In contrast with previous contributions in the
literature, no restrictions are imposed on the range space of the learned
matrix. The resulting algorithms maintain a linear complexity in the problem
size and enjoy important invariance properties. We apply the proposed
algorithms to the problem of learning a distance function parameterized by a
positive semidefinite matrix. Good performance is observed on classical
benchmarks
Fast global convergence of gradient methods for high-dimensional statistical recovery
Many statistical -estimators are based on convex optimization problems
formed by the combination of a data-dependent loss function with a norm-based
regularizer. We analyze the convergence rates of projected gradient and
composite gradient methods for solving such problems, working within a
high-dimensional framework that allows the data dimension \pdim to grow with
(and possibly exceed) the sample size \numobs. This high-dimensional
structure precludes the usual global assumptions---namely, strong convexity and
smoothness conditions---that underlie much of classical optimization analysis.
We define appropriately restricted versions of these conditions, and show that
they are satisfied with high probability for various statistical models. Under
these conditions, our theory guarantees that projected gradient descent has a
globally geometric rate of convergence up to the \emph{statistical precision}
of the model, meaning the typical distance between the true unknown parameter
and an optimal solution . This result is substantially
sharper than previous convergence results, which yielded sublinear convergence,
or linear convergence only up to the noise level. Our analysis applies to a
wide range of -estimators and statistical models, including sparse linear
regression using Lasso (-regularized regression); group Lasso for block
sparsity; log-linear models with regularization; low-rank matrix recovery using
nuclear norm regularization; and matrix decomposition. Overall, our analysis
reveals interesting connections between statistical precision and computational
efficiency in high-dimensional estimation
On limited-memory quasi-Newton methods for minimizing a quadratic function
The main focus in this paper is exact linesearch methods for minimizing a
quadratic function whose Hessian is positive definite. We give two classes of
limited-memory quasi-Newton Hessian approximations that generate search
directions parallel to those of the method of preconditioned conjugate
gradients, and hence give finite termination on quadratic optimization
problems. The Hessian approximations are described by a novel compact
representation which provides a dynamical framework. We also discuss possible
extensions of these classes and show their behavior on randomly generated
quadratic optimization problems. The methods behave numerically similar to
L-BFGS. Inclusion of information from the first iteration in the limited-memory
Hessian approximation and L-BFGS significantly reduces the effects of round-off
errors on the considered problems. In addition, we give our compact
representation of the Hessian approximations in the full Broyden class for the
general unconstrained optimization problem. This representation consists of
explicit matrices and gradients only as vector components
- …