1,037 research outputs found

    On the sharpness of the weighted Bernstein-Walsh inequality, with applications to the superlinear convergence of conjugate gradients

    Full text link
    In this paper we show that the weighted Bernstein-Walsh inequality in logarithmic potential theory is sharp up to some new universal constant, provided that the external field is given by a logarithmic potential. Our main tool for such results is a new technique of discretization of logarithmic potentials, where we take the same starting point as in earlier work of Totik and of Levin \& Lubinsky, but add an important new ingredient, namely some new mean value property for the cumulative distribution function of the underlying measure. As an application, we revisit the work of Beckermann \& Kuijlaars on the superlinear convergence of conjugate gradients. These authors have determined the asymptotic convergence factor for sequences of systems of linear equations with an asymptotic eigenvalue distribution. There was some numerical evidence to let conjecture that the integral mean of Green functions occurring in their work should also allow to give inequalities for the rate of convergence if one makes a suitable link between measures and the eigenvalues of a single matrix of coefficients. We prove this conjecture , at least for a class of measures which is of particular interest for applications

    Subsampled Inexact Newton methods for minimizing large sums of convex functions

    Full text link
    This paper deals with the minimization of large sum of convex functions by Inexact Newton (IN) methods employing subsampled functions, gradients and Hessian approximations. The Conjugate Gradient method is used to compute the inexact Newton step and global convergence is enforced by a nonmonotone line search procedure. The aim is to obtain methods with affordable costs and fast convergence. Assuming strongly convex functions, R-linear convergence and worst-case iteration complexity of the procedure are investigated when functions and gradients are approximated with increasing accuracy. A set of rules for the forcing parameters and subsample Hessian sizes are derived that ensure local q-linear/superlinear convergence of the proposed method. The random choice of the Hessian subsample is also considered and convergence in the mean square, both for finite and infinite sums of functions, is proved. Finally, global convergence with asymptotic R-linear rate of IN methods is extended to the case of sum of convex function and strongly convex objective function. Numerical results on well known binary classification problems are also given. Adaptive strategies for selecting forcing terms and Hessian subsample size, streaming out of the theoretical analysis, are employed and the numerical results showed that they yield effective IN methods

    Composite Convex Optimization with Global and Local Inexact Oracles

    Full text link
    We introduce new global and local inexact oracle concepts for a wide class of convex functions in composite convex minimization. Such inexact oracles naturally come from primal-dual framework, barrier smoothing, inexact computations of gradients and Hessian, and many other situations. We also provide examples showing that the class of convex functions equipped with the newly inexact second-order oracles is larger than standard self-concordant as well as Lipschitz gradient function classes. Further, we investigate several properties of convex and/or self-concordant functions under the inexact second-order oracles which are useful for algorithm development. Next, we apply our theory to develop inexact proximal Newton-type schemes for minimizing general composite convex minimization problems equipped with such inexact oracles. Our theoretical results consist of new optimization algorithms, accompanied with global convergence guarantees to solve a wide class of composite convex optimization problems. When the first objective term is additionally self-concordant, we establish different local convergence results for our method. In particular, we prove that depending on the choice of accuracy levels of the inexact second-order oracles, we obtain different local convergence rates ranging from RR-linear and RR-superlinear to RR-quadratic. In special cases, where convergence bounds are known, our theory recovers the best known rates. We also apply our settings to derive a new primal-dual method for composite convex minimization problems. Finally, we present some representative numerical examples to illustrate the benefit of our new algorithms.Comment: 28 pages, 6 figures, and 2 table

    Maximum Weighted Sum Rate of Multi-Antenna Broadcast Channels

    Full text link
    Recently, researchers showed that dirty paper coding (DPC) is the optimal transmission strategy for multiple-input multiple-output broadcast channels (MIMO-BC). In this paper, we study how to determine the maximum weighted sum of DPC rates through solving the maximum weighted sum rate problem of the dual MIMO multiple access channel (MIMO-MAC) with a sum power constraint. We first simplify the maximum weighted sum rate problem such that enumerating all possible decoding orders in the dual MIMO-MAC is unnecessary. We then design an efficient algorithm based on conjugate gradient projection (CGP) to solve the maximum weighted sum rate problem. Our proposed CGP method utilizes the powerful concept of Hessian conjugacy. We also develop a rigorous algorithm to solve the projection problem. We show that CGP enjoys provable convergence, nice scalability, and great efficiency for large MIMO-BC systems

    An Active Set Algorithm for Nonlinear Optimization with Polyhedral Constraints

    Full text link
    A polyhedral active set algorithm PASA is developed for solving a nonlinear optimization problem whose feasible set is a polyhedron. Phase one of the algorithm is the gradient projection method, while phase two is any algorithm for solving a linearly constrained optimization problem. Rules are provided for branching between the two phases. Global convergence to a stationary point is established, while asymptotically PASA performs only phase two when either a nondegeneracy assumption holds, or the active constraints are linearly independent and a strong second-order sufficient optimality condition holds

    A Second-Order Method for Compressed Sensing Problems with Coherent and Redundant Dictionaries

    Full text link
    In this paper we are interested in the solution of Compressed Sensing (CS) problems where the signals to be recovered are sparse in coherent and redundant dictionaries. CS problems of this type are convex with non-smooth and non-separable regularization term, therefore a specialized solver is required. We propose a primal-dual Newton Conjugate Gradients (pdNCG) method. We prove global convergence and fast local rate of convergence for pdNCG. Moreover, well-known properties of CS problems are exploited for the development of provably effective preconditioning techniques that speed-up the approximate solution of linear systems which arise. Numerical results are presented on CS problems which demonstrate the performance of pdNCG compared to a state-of-the-art existing solver.Comment: 26 pages, 22 figure

    A Globally and Superlinearly Convergent Modified BFGS Algorithm for Unconstrained Optimization

    Full text link
    In this paper, a modified BFGS algorithm is proposed. The modified BFGS matrix estimates a modified Hessian matrix which is a convex combination of an identity matrix for the steepest descent algorithm and a Hessian matrix for the Newton algorithm. The coefficient of the convex combination in the modified BFGS algorithm is dynamically chosen in every iteration. It is proved that, for any twice differentiable nonlinear function (convex or non-convex), the algorithm is globally convergent to a stationary point. If the stationary point is a local optimizer where the Hessian is strongly positive definite in a neighborhood of the optimizer, the iterates will eventually enter and stay in the neighborhood, and the modified BFGS algorithm reduces to the BFGS algorithm in this neighborhood. Therefore, the modified BFGS algorithm is super-linearly convergent. Moreover, the computational cost of the modified BFGS in each iteration is almost the same as the cost of the BFGS. Numerical test on the CUTE test set is reported. The performance of the modified BFGS algorithm implemented in our MATLAB function is compared to the BFGS algorithm implemented in the MATLAB Optimization Toolbox function, a limited memory BFGS implemented as L-BFGS, a descent conjugate gradient algorithm implemented as CG-Descent 5.3, and a limited memory, descent and conjugate algorithm implemented as L-CG-Descent. This result shows that the modified BFGS algorithm may be very effective.Comment: arXiv admin note: text overlap with arXiv:1212.545

    Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss

    Full text link
    We consider distributed convex optimization problems originated from sample average approximation of stochastic optimization, or empirical risk minimization in machine learning. We assume that each machine in the distributed computing system has access to a local empirical loss function, constructed with i.i.d. data sampled from a common distribution. We propose a communication-efficient distributed algorithm to minimize the overall empirical loss, which is the average of the local empirical losses. The algorithm is based on an inexact damped Newton method, where the inexact Newton steps are computed by a distributed preconditioned conjugate gradient method. We analyze its iteration complexity and communication efficiency for minimizing self-concordant empirical loss functions, and discuss the results for distributed ridge regression, logistic regression and binary classification with a smoothed hinge loss. In a standard setting for supervised learning, the required number of communication rounds of the algorithm does not increase with the sample size, and only grows slowly with the number of machines

    Adaptive norms for deep learning with regularized Newton methods

    Full text link
    We investigate the use of regularized Newton methods with adaptive norms for optimizing neural networks. This approach can be seen as a second-order counterpart of adaptive gradient methods, which we here show to be interpretable as first-order trust region methods with ellipsoidal constraints. In particular, we prove that the preconditioning matrix used in RMSProp and Adam satisfies the necessary conditions for provable convergence of second-order trust region methods with standard worst-case complexities on general non-convex objectives. Furthermore, we run experiments across different neural architectures and datasets to find that the ellipsoidal constraints constantly outperform their spherical counterpart both in terms of number of backpropagations and asymptotic loss value. Finally, we find comparable performance to state-of-the-art first-order methods in terms of backpropagations, but further advances in hardware are needed to render Newton methods competitive in terms of computational time

    Sub-Sampled Newton Methods II: Local Convergence Rates

    Full text link
    Many data-fitting applications require the solution of an optimization problem involving a sum of large number of functions of high dimensional parameter. Here, we consider the problem of minimizing a sum of nn functions over a convex constraint set XβŠ†Rp\mathcal{X} \subseteq \mathbb{R}^{p} where both nn and pp are large. In such problems, sub-sampling as a way to reduce nn can offer great amount of computational efficiency. Within the context of second order methods, we first give quantitative local convergence results for variants of Newton's method where the Hessian is uniformly sub-sampled. Using random matrix concentration inequalities, one can sub-sample in a way that the curvature information is preserved. Using such sub-sampling strategy, we establish locally Q-linear and Q-superlinear convergence rates. We also give additional convergence results for when the sub-sampled Hessian is regularized by modifying its spectrum or Levenberg-type regularization. Finally, in addition to Hessian sub-sampling, we consider sub-sampling the gradient as way to further reduce the computational complexity per iteration. We use approximate matrix multiplication results from randomized numerical linear algebra (RandNLA) to obtain the proper sampling strategy and we establish locally R-linear convergence rates. In such a setting, we also show that a very aggressive sample size increase results in a R-superlinearly convergent algorithm. While the sample size depends on the condition number of the problem, our convergence rates are problem-independent, i.e., they do not depend on the quantities related to the problem. Hence, our analysis here can be used to complement the results of our basic framework from the companion paper, [38], by exploring algorithmic trade-offs that are important in practice
    • …
    corecore