53 research outputs found

    Optimization Algorithms for Machine Learning Designed for Parallel and Distributed Environments

    Get PDF
    This thesis proposes several optimization methods that utilize parallel algorithms for large-scale machine learning problems. The overall theme is network-based machine learning algorithms; in particular, we consider two machine learning models: graphical models and neural networks. Graphical models are methods categorized under unsupervised machine learning, aiming at recovering conditional dependencies among random variables from observed samples of a multivariable distribution. Neural networks, on the other hand, are methods that learn an implicit approximation to underlying true nonlinear functions based on sample data and utilize that information to generalize to validation data. The goal of finding the best methods relies on an optimization problem tasked with training such models. Improvements in current methods of solving the optimization problem for graphical models are obtained by parallelization and the use of a new update and a new step-size selection rule in the coordinate descent algorithms designed for large-scale problems. For training deep neural networks, we consider the second-order optimization algorithms within trust-region-like optimization frameworks. Deep networks are represented using large-scale vectors of weights and are trained based on very large datasets. Hence, obtaining second-order information is very expensive for these networks. In this thesis, we undertake an extensive exploration of algorithms that use a small number of curvature evaluations and are hence faster than other existing methods

    Homogeneous Second-Order Descent Framework: A Fast Alternative to Newton-Type Methods

    Full text link
    This paper proposes a homogeneous second-order descent framework (HSODF) for nonconvex and convex optimization based on the generalized homogeneous model (GHM). In comparison to the Newton steps, the GHM can be solved by extremal symmetric eigenvalue procedures and thus grant an advantage in ill-conditioned problems. Moreover, GHM extends the ordinary homogeneous model (OHM) to allow adaptiveness in the construction of the aggregated matrix. Consequently, HSODF is able to recover some well-known second-order methods, such as trust-region methods and gradient regularized methods, while maintaining comparable iteration complexity bounds. We also study two specific realizations of HSODF. One is adaptive HSODM, which has a parameter-free O(ϵ−3/2)O(\epsilon^{-3/2}) global complexity bound for nonconvex second-order Lipschitz continuous objective functions. The other one is homotopy HSODM, which is proven to have a global linear rate of convergence without strong convexity. The efficiency of our approach to ill-conditioned and high-dimensional problems is justified by some preliminary numerical results.Comment: improved writin

    Regularized methods via cubic subspace minimization for nonconvex optimization

    Full text link
    The main computational cost per iteration of adaptive cubic regularization methods for solving large-scale nonconvex problems is the computation of the step sks_k, which requires an approximate minimizer of the cubic model. We propose a new approach in which this minimizer is sought in a low dimensional subspace that, in contrast to classical approaches, is reused for a number of iterations. A regularized Newton step to correct sks_k is also incorporated whenever needed. We show that our method increases efficiency while preserving the worst-case complexity of classical cubic regularized methods. We also explore the use of rational Krylov subspaces for the subspace minimization, to overcome some of the issues encountered when using polynomial Krylov subspaces. We provide several experimental results illustrating the gains of the new approach when compared to classic implementations

    Algorithms for Trust-Region Subproblems with Linear Inequality Constraints

    Get PDF
    In the trust-region framework for optimizing a general nonlinear function subject to nonlinear inequality constraints, sequential quadratic programming (SQP) techniques generate subproblems in which a quadratic function must be minimized over a spherical region subject to linear inequality constraints. An interior-point algorithm proposed by Kearsley approximately solves these subproblems when the objective functions are large-scale and convex. Kearsley's algorithm handles the inequality constraints with a classical log-barrier function, minimizing quadratic models of the log-barrier function for fixed values of the barrier parameter subject to the trust-region constraint. Kearsley recommends the LSTRS algorithm of Rojas et al. for minimizing these models. For the convex case, we prove convergence of Kearsley's algorithm and suggest alternatives to the LSTRS algorithm. These alternatives include the new annulus algorithm of Griffin et al., which blends the conjugate gradient and sequential subspace minimization methods to yield promising numerical results. For the nonconvex case, we propose and test a new interior-point algorithm that incorporates the annulus algorithm into an SQP framework with trust regions.Doctor of Philosoph

    Solving regularized nonlinear least-squares problem in dual space with application to variational data assimilation

    Get PDF
    Cette thèse étudie la méthode du gradient conjugué et la méthode de Lanczos pour la résolution de problèmes aux moindres carrés non-linéaires sous déterminés et régularisés par un terme de pénalisation quadratique. Ces problèmes résultent souvent d'une approche du maximum de vraisemblance, et impliquent un ensemble de m observations physiques et n inconnues estimées par régression non linéaire. Nous supposons ici que n est grand par rapport à m. Un tel cas se présente lorsque des champs tridimensionnels sont estimés à partir d'observations physiques, par exemple dans l'assimilation de données appliquée aux modèles du système terrestre. Un algorithme largement utilisé dans ce contexte est la méthode de Gauss- Newton (GN), connue dans la communauté d'assimilation de données sous le nom d'assimilation variationnelle des données quadridimensionnelles. Le procédé GN repose sur la résolution approchée d'une séquence de moindres carrés linéaires optimale dans laquelle la fonction coût non-linéaire des moindres carrés est approximée par une fonction quadratique dans le voisinage de l'itération non linéaire en cours. Cependant, il est bien connu que cette simple variante de l'algorithme de Gauss-Newton ne garantit pas une diminution monotone de la fonction coût et sa convergence n'est donc pas garantie. Cette difficulté est généralement surmontée en utilisant une recherche linéaire (Dennis and Schnabel, 1983) ou une méthode de région de confiance (Conn, Gould and Toint, 2000), qui assure la convergence globale des points critiques du premier ordre sous des hypothèses faibles. Nous considérons la seconde de ces approches dans cette thèse. En outre, compte tenu de la grande échelle de ce problème, nous proposons ici d'utiliser un algorithme de région de confiance particulier s'appuyant sur la méthode du gradient conjugué tronqué de Steihaug-Toint pour la résolution approchée du sous-problème (Conn, Gould and Toint, 2000, p. 133-139) La résolution de ce sous-problème dans un espace à n dimensions (par CG ou Lanczos) est considérée comme l'approche primale. Comme alternative, une réduction significative du coût de calcul est possible en réécrivant l'approximation quadratique dans l'espace à m dimensions associé aux observations. Ceci est important pour les applications à grande échelle telles que celles quotidiennement traitées dans les systèmes de prévisions météorologiques. Cette approche, qui effectue la minimisation de l'espace à m dimensions à l'aide CG ou de ces variantes, est considérée comme l'approche duale. La première approche proposée (Da Silva et al., 1995; Cohn et al., 1998; Courtier, 1997), connue sous le nom de Système d'analyse Statistique de l'espace Physique (PSAS) dans la communauté d'assimilation de données, commence par la minimisation de la fonction de coût duale dans l'espace de dimension m par un CG préconditionné (PCG), puis revient l'espace à n dimensions. Techniquement, l'algorithme se compose de formules de récurrence impliquant des vecteurs de taille m au lieu de vecteurs de taille n. Cependant, l'utilisation de PSAS peut être excessivement coûteuse car il a été remarqué que la fonction de coût linéaire des moindres carrés ne diminue pas monotonement au cours des itérations non-linéaires. Une autre approche duale, connue sous le nom de méthode du gradient conjugué préconditionné restreint (RPCG), a été proposée par Gratton and Tshimanga (2009). Celle-ci génère les mêmes itérations en arithmétique exacte que l'approche primale, à nouveau en utilisant la formule de récurrence impliquant des vecteurs taille m. L'intérêt principal de RPCG est qu'il en résulte une réduction significative de la mémoire utilisée et des coûts de calcul tout en conservant la propriété de convergence souhaitée, contrairement à l'algorithme PSAS. La relation entre ces deux approches duales et la dérivation de préconditionneurs efficaces (Gratton, Sartenaer and Tshimanga, 2011), essentiels pour les problèmes à grande échelle, n'ont pas été abordées par Gratton and Tshimanga (2009). La motivation principale de cette thèse est de répondre à ces questions. En particulier, nous nous intéressons à la conception de techniques de préconditionnement et à une généralisation des régions de confiance qui maintiennent la correspondance une-à-une entre itérations primales et duales, opérant ainsi un calcul éfficace avec un algorithme globalement convergent. ABSTRACT : This thesis investigates the conjugate-gradient method and the Lanczos method for the solution of under-determined nonlinear least-squares problems regularized by a quadratic penalty term. Such problems often result from a maximum likelihood approach, and involve a set of m physical observations and n unknowns that are estimated by nonlinear regression. We suppose here that n is large compared to m. These problems are encountered for instance when three-dimensional fields are estimated from physical observations, as is the case in data assimilation in Earth system models. A widely used algorithm in this context is the Gauss-Newton (GN) method, known in the data assimilation community under the name of incremental four dimensional variational data assimilation. The GN method relies on the approximate solution of a sequence of linear least-squares problems in which the nonlinear least-squares cost function is approximated by a quadratic function in the neighbourhood of the current nonlinear iterate. However, it is well known that this simple variant of the Gauss-Newton algorithm does not ensure a monotonic decrease of the cost function and that convergence is not guaranteed. Removing this difficulty is typically achieved by using a line-search (Dennis and Schnabel, 1983) or trust-region (Conn, Gould and Toint, 2000) strategy, which ensures global convergence to first order critical points under mild assumptions. We consider the second of these approaches in this thesis. Moreover, taking into consideration the large-scale nature of the problem, we propose here to use a particular trust-region algorithm relying on the Steihaug-Toint truncated conjugate-gradient method for the approximate solution of the subproblem (Conn, Gould and Toint, 2000, pp. 133-139). Solving this subproblem in the n-dimensional space (by CG or Lanczos) is referred to as the primal approach. Alternatively, a significant reduction in the computational cost is possible by rewriting the quadratic approximation in the m-dimensional space associated with the observations. This is important for large-scale applications such as those solved daily in weather prediction systems. This approach, which performs the minimization in the m-dimensional space using CG or variants thereof, is referred to as the dual approach. The first proposed dual approach (Courtier, 1997), known as the Physical-space Statistical Analysis System (PSAS) in the data assimilation community starts by solving the corresponding dual cost function in m-dimensional space by a standard preconditioned CG (PCG), and then recovers the step in n-dimensional space through multiplication by an n by m matrix. Technically, the algorithm consists of recurrence formulas involving m-vectors instead of n-vectors. However, the use of PSAS can be unduly costly as it was noticed that the linear least-squares cost function does not monotonically decrease along the nonlinear iterations when applying standard termination. Another dual approach has been proposed by Gratton and Tshimanga (2009) and is known as the Restricted Preconditioned Conjugate Gradient (RPCG) method. It generates the same iterates in exact arithmetic as those generated by the primal approach, again using recursion formula involving m-vectors. The main interest of RPCG is that it results in significant reduction of both memory and computational costs while maintaining the desired convergence property, in contrast with the PSAS algorithm. The relation between these two dual approaches and the question of deriving efficient preconditioners (Gratton, Sartenaer and Tshimanga, 2011), essential when large-scale problems are considered, was not addressed in Gratton and Tshimanga (2009). The main motivation for this thesis is to address these open issues. In particular, we are interested in designing preconditioning techniques and a trust-region globalization which maintains the one-to-one correspondance between primal and dual iterates, thereby offering a cost-effective computation in a globally convergent algorithm

    Numerical Methods for Mixed-Integer Optimal Control with Combinatorial Constraints

    Get PDF
    This thesis is concerned with numerical methods for Mixed-Integer Optimal Control Problems with Combinatorial Constraints. We establish an approximation theorem relating a Mixed-Integer Optimal Control Problem with Combinatorial Constraints to a continuous relaxed convexified Optimal Control Problems with Vanishing Constraints that provides the basis for numerical computations. We develop a a Vanishing- Constraint respecting rounding algorithm to exploit this correspondence computationally. Direct Discretization of the Optimal Control Problem with Vanishing Constraints yield a subclass of Mathematical Programs with Equilibrium Constraints. Mathematical Programs with Equilibrium Constraint constitute a class of challenging problems due to their inherent non-convexity and non-smoothness. We develop an active-set algorithm for Mathematical Programs with Equilibrium Constraints and prove global convergence to Bouligand stationary points of this algorithm under suitable technical conditions. For efficient computation of Newton-type steps of Optimal Control Problems, we establish the Generalized Lanczos Method for trust region problems in a Hilbert space context. To ensure real-time feasibility in Online Optimal Control Applications with tracking-type Lagrangian objective, we develop a Gauß-Newton preconditioner for the iterative solution method of the trust region problem. We implement the proposed methods and demonstrate their applicability and efficacy on several benchmark problems

    Updating the regularization parameter in the adaptive cubic regularization algorithm

    Get PDF
    The adaptive cubic regularization method (Cartis et al. in Math. Program. Ser. A 127(2):245–295, 2011; Math. Program. Ser. A. 130(2):295–319, 2011) has been recently proposed for solving unconstrained minimization problems. At each iteration of this method, the objective function is replaced by a cubic approximation which comprises an adaptive regularization parameter whose role is related to the local Lipschitz constant of the objective’s Hessian. We present new updating strategies for this parameter based on interpolation techniques, which improve the overall numerical performance of the algorithm. Numerical experiments on large nonlinear least-squares problems are provided
    • …