15,411 research outputs found
Probabilistic Interpretation of Linear Solvers
This manuscript proposes a probabilistic framework for algorithms that
iteratively solve unconstrained linear problems with positive definite
for . The goal is to replace the point estimates returned by existing
methods with a Gaussian posterior belief over the elements of the inverse of
, which can be used to estimate errors. Recent probabilistic interpretations
of the secant family of quasi-Newton optimization algorithms are extended.
Combined with properties of the conjugate gradient algorithm, this leads to
uncertainty-calibrated methods with very limited cost overhead over conjugate
gradients, a self-contained novel interpretation of the quasi-Newton and
conjugate gradient algorithms, and a foundation for new nonlinear optimization
methods.Comment: final version, in press at SIAM J Optimizatio
The geometry of nonlinear least squares with applications to sloppy models and optimization
Parameter estimation by nonlinear least squares minimization is a common
problem with an elegant geometric interpretation: the possible parameter values
of a model induce a manifold in the space of data predictions. The minimization
problem is then to find the point on the manifold closest to the data. We show
that the model manifolds of a large class of models, known as sloppy models,
have many universal features; they are characterized by a geometric series of
widths, extrinsic curvatures, and parameter-effects curvatures. A number of
common difficulties in optimizing least squares problems are due to this common
structure. First, algorithms tend to run into the boundaries of the model
manifold, causing parameters to diverge or become unphysical. We introduce the
model graph as an extension of the model manifold to remedy this problem. We
argue that appropriate priors can remove the boundaries and improve convergence
rates. We show that typical fits will have many evaporated parameters. Second,
bare model parameters are usually ill-suited to describing model behavior; cost
contours in parameter space tend to form hierarchies of plateaus and canyons.
Geometrically, we understand this inconvenient parametrization as an extremely
skewed coordinate basis and show that it induces a large parameter-effects
curvature on the manifold. Using coordinates based on geodesic motion, these
narrow canyons are transformed in many cases into a single quadratic, isotropic
basin. We interpret the modified Gauss-Newton and Levenberg-Marquardt fitting
algorithms as an Euler approximation to geodesic motion in these natural
coordinates on the model manifold and the model graph respectively. By adding a
geodesic acceleration adjustment to these algorithms, we alleviate the
difficulties from parameter-effects curvature, improving both efficiency and
success rates at finding good fits.Comment: 40 pages, 29 Figure
CoCoA: A General Framework for Communication-Efficient Distributed Optimization
The scale of modern datasets necessitates the development of efficient
distributed optimization methods for machine learning. We present a
general-purpose framework for distributed computing environments, CoCoA, that
has an efficient communication scheme and is applicable to a wide variety of
problems in machine learning and signal processing. We extend the framework to
cover general non-strongly-convex regularizers, including L1-regularized
problems like lasso, sparse logistic regression, and elastic net
regularization, and show how earlier work can be derived as a special case. We
provide convergence guarantees for the class of convex regularized loss
minimization objectives, leveraging a novel approach in handling
non-strongly-convex regularizers and non-smooth loss functions. The resulting
framework has markedly improved performance over state-of-the-art methods, as
we illustrate with an extensive set of experiments on real distributed
datasets
- …