1,037 research outputs found
On the sharpness of the weighted Bernstein-Walsh inequality, with applications to the superlinear convergence of conjugate gradients
In this paper we show that the weighted Bernstein-Walsh inequality in
logarithmic potential theory is sharp up to some new universal constant,
provided that the external field is given by a logarithmic potential. Our main
tool for such results is a new technique of discretization of logarithmic
potentials, where we take the same starting point as in earlier work of Totik
and of Levin \& Lubinsky, but add an important new ingredient, namely some new
mean value property for the cumulative distribution function of the underlying
measure. As an application, we revisit the work of Beckermann \& Kuijlaars on
the superlinear convergence of conjugate gradients. These authors have
determined the asymptotic convergence factor for sequences of systems of linear
equations with an asymptotic eigenvalue distribution. There was some numerical
evidence to let conjecture that the integral mean of Green functions occurring
in their work should also allow to give inequalities for the rate of
convergence if one makes a suitable link between measures and the eigenvalues
of a single matrix of coefficients. We prove this conjecture , at least for a
class of measures which is of particular interest for applications
Subsampled Inexact Newton methods for minimizing large sums of convex functions
This paper deals with the minimization of large sum of convex functions by
Inexact Newton (IN) methods employing subsampled functions, gradients and
Hessian approximations. The Conjugate Gradient method is used to compute the
inexact Newton step and global convergence is enforced by a nonmonotone line
search procedure. The aim is to obtain methods with affordable costs and fast
convergence. Assuming strongly convex functions, R-linear convergence and
worst-case iteration complexity of the procedure are investigated when
functions and gradients are approximated with increasing accuracy. A set of
rules for the forcing parameters and subsample Hessian sizes are derived that
ensure local q-linear/superlinear convergence of the proposed method.
The random choice of the Hessian subsample is also considered and convergence
in the mean square, both for finite and infinite sums of functions, is proved.
Finally, global convergence with asymptotic R-linear rate of IN methods is
extended to the case of sum of convex function and strongly convex objective
function. Numerical results on well known binary classification problems are
also given. Adaptive strategies for selecting forcing terms and Hessian
subsample size, streaming out of the theoretical analysis, are employed and the
numerical results showed that they yield effective IN methods
Composite Convex Optimization with Global and Local Inexact Oracles
We introduce new global and local inexact oracle concepts for a wide class of
convex functions in composite convex minimization. Such inexact oracles
naturally come from primal-dual framework, barrier smoothing, inexact
computations of gradients and Hessian, and many other situations. We also
provide examples showing that the class of convex functions equipped with the
newly inexact second-order oracles is larger than standard self-concordant as
well as Lipschitz gradient function classes. Further, we investigate several
properties of convex and/or self-concordant functions under the inexact
second-order oracles which are useful for algorithm development. Next, we apply
our theory to develop inexact proximal Newton-type schemes for minimizing
general composite convex minimization problems equipped with such inexact
oracles. Our theoretical results consist of new optimization algorithms,
accompanied with global convergence guarantees to solve a wide class of
composite convex optimization problems. When the first objective term is
additionally self-concordant, we establish different local convergence results
for our method. In particular, we prove that depending on the choice of
accuracy levels of the inexact second-order oracles, we obtain different local
convergence rates ranging from -linear and -superlinear to -quadratic.
In special cases, where convergence bounds are known, our theory recovers the
best known rates. We also apply our settings to derive a new primal-dual method
for composite convex minimization problems. Finally, we present some
representative numerical examples to illustrate the benefit of our new
algorithms.Comment: 28 pages, 6 figures, and 2 table
Maximum Weighted Sum Rate of Multi-Antenna Broadcast Channels
Recently, researchers showed that dirty paper coding (DPC) is the optimal
transmission strategy for multiple-input multiple-output broadcast channels
(MIMO-BC). In this paper, we study how to determine the maximum weighted sum of
DPC rates through solving the maximum weighted sum rate problem of the dual
MIMO multiple access channel (MIMO-MAC) with a sum power constraint. We first
simplify the maximum weighted sum rate problem such that enumerating all
possible decoding orders in the dual MIMO-MAC is unnecessary. We then design an
efficient algorithm based on conjugate gradient projection (CGP) to solve the
maximum weighted sum rate problem. Our proposed CGP method utilizes the
powerful concept of Hessian conjugacy. We also develop a rigorous algorithm to
solve the projection problem. We show that CGP enjoys provable convergence,
nice scalability, and great efficiency for large MIMO-BC systems
An Active Set Algorithm for Nonlinear Optimization with Polyhedral Constraints
A polyhedral active set algorithm PASA is developed for solving a nonlinear
optimization problem whose feasible set is a polyhedron. Phase one of the
algorithm is the gradient projection method, while phase two is any algorithm
for solving a linearly constrained optimization problem. Rules are provided for
branching between the two phases. Global convergence to a stationary point is
established, while asymptotically PASA performs only phase two when either a
nondegeneracy assumption holds, or the active constraints are linearly
independent and a strong second-order sufficient optimality condition holds
A Second-Order Method for Compressed Sensing Problems with Coherent and Redundant Dictionaries
In this paper we are interested in the solution of Compressed Sensing (CS)
problems where the signals to be recovered are sparse in coherent and redundant
dictionaries. CS problems of this type are convex with non-smooth and
non-separable regularization term, therefore a specialized solver is required.
We propose a primal-dual Newton Conjugate Gradients (pdNCG) method. We prove
global convergence and fast local rate of convergence for pdNCG. Moreover,
well-known properties of CS problems are exploited for the development of
provably effective preconditioning techniques that speed-up the approximate
solution of linear systems which arise. Numerical results are presented on CS
problems which demonstrate the performance of pdNCG compared to a
state-of-the-art existing solver.Comment: 26 pages, 22 figure
A Globally and Superlinearly Convergent Modified BFGS Algorithm for Unconstrained Optimization
In this paper, a modified BFGS algorithm is proposed. The modified BFGS
matrix estimates a modified Hessian matrix which is a convex combination of an
identity matrix for the steepest descent algorithm and a Hessian matrix for the
Newton algorithm. The coefficient of the convex combination in the modified
BFGS algorithm is dynamically chosen in every iteration. It is proved that, for
any twice differentiable nonlinear function (convex or non-convex), the
algorithm is globally convergent to a stationary point. If the stationary point
is a local optimizer where the Hessian is strongly positive definite in a
neighborhood of the optimizer, the iterates will eventually enter and stay in
the neighborhood, and the modified BFGS algorithm reduces to the BFGS algorithm
in this neighborhood. Therefore, the modified BFGS algorithm is super-linearly
convergent. Moreover, the computational cost of the modified BFGS in each
iteration is almost the same as the cost of the BFGS. Numerical test on the
CUTE test set is reported. The performance of the modified BFGS algorithm
implemented in our MATLAB function is compared to the BFGS algorithm
implemented in the MATLAB Optimization Toolbox function, a limited memory BFGS
implemented as L-BFGS, a descent conjugate gradient algorithm implemented as
CG-Descent 5.3, and a limited memory, descent and conjugate algorithm
implemented as L-CG-Descent. This result shows that the modified BFGS algorithm
may be very effective.Comment: arXiv admin note: text overlap with arXiv:1212.545
Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss
We consider distributed convex optimization problems originated from sample
average approximation of stochastic optimization, or empirical risk
minimization in machine learning. We assume that each machine in the
distributed computing system has access to a local empirical loss function,
constructed with i.i.d. data sampled from a common distribution. We propose a
communication-efficient distributed algorithm to minimize the overall empirical
loss, which is the average of the local empirical losses. The algorithm is
based on an inexact damped Newton method, where the inexact Newton steps are
computed by a distributed preconditioned conjugate gradient method. We analyze
its iteration complexity and communication efficiency for minimizing
self-concordant empirical loss functions, and discuss the results for
distributed ridge regression, logistic regression and binary classification
with a smoothed hinge loss. In a standard setting for supervised learning, the
required number of communication rounds of the algorithm does not increase with
the sample size, and only grows slowly with the number of machines
Adaptive norms for deep learning with regularized Newton methods
We investigate the use of regularized Newton methods with adaptive norms for
optimizing neural networks. This approach can be seen as a second-order
counterpart of adaptive gradient methods, which we here show to be
interpretable as first-order trust region methods with ellipsoidal constraints.
In particular, we prove that the preconditioning matrix used in RMSProp and
Adam satisfies the necessary conditions for provable convergence of
second-order trust region methods with standard worst-case complexities on
general non-convex objectives. Furthermore, we run experiments across different
neural architectures and datasets to find that the ellipsoidal constraints
constantly outperform their spherical counterpart both in terms of number of
backpropagations and asymptotic loss value. Finally, we find comparable
performance to state-of-the-art first-order methods in terms of
backpropagations, but further advances in hardware are needed to render Newton
methods competitive in terms of computational time
Sub-Sampled Newton Methods II: Local Convergence Rates
Many data-fitting applications require the solution of an optimization
problem involving a sum of large number of functions of high dimensional
parameter. Here, we consider the problem of minimizing a sum of functions
over a convex constraint set where both
and are large. In such problems, sub-sampling as a way to reduce
can offer great amount of computational efficiency.
Within the context of second order methods, we first give quantitative local
convergence results for variants of Newton's method where the Hessian is
uniformly sub-sampled. Using random matrix concentration inequalities, one can
sub-sample in a way that the curvature information is preserved. Using such
sub-sampling strategy, we establish locally Q-linear and Q-superlinear
convergence rates. We also give additional convergence results for when the
sub-sampled Hessian is regularized by modifying its spectrum or Levenberg-type
regularization.
Finally, in addition to Hessian sub-sampling, we consider sub-sampling the
gradient as way to further reduce the computational complexity per iteration.
We use approximate matrix multiplication results from randomized numerical
linear algebra (RandNLA) to obtain the proper sampling strategy and we
establish locally R-linear convergence rates. In such a setting, we also show
that a very aggressive sample size increase results in a R-superlinearly
convergent algorithm.
While the sample size depends on the condition number of the problem, our
convergence rates are problem-independent, i.e., they do not depend on the
quantities related to the problem. Hence, our analysis here can be used to
complement the results of our basic framework from the companion paper, [38],
by exploring algorithmic trade-offs that are important in practice
- β¦