85 research outputs found

    Cubic Regularization is the Key! The First Accelerated Quasi-Newton Method with a Global Convergence Rate of O(k2)O(k^{-2}) for Convex Functions

    Full text link
    In this paper, we propose the first Quasi-Newton method with a global convergence rate of O(k1)O(k^{-1}) for general convex functions. Quasi-Newton methods, such as BFGS, SR-1, are well-known for their impressive practical performance. However, they may be slower than gradient descent for general convex functions, with the best theoretical rate of O(k1/3)O(k^{-1/3}). This gap between impressive practical performance and poor theoretical guarantees was an open question for a long period of time. In this paper, we make a significant step to close this gap. We improve upon the existing rate and propose the Cubic Regularized Quasi-Newton Method with a convergence rate of O(k1)O(k^{-1}). The key to achieving this improvement is to use the Cubic Regularized Newton Method over the Damped Newton Method as an outer method, where the Quasi-Newton update is an inexact Hessian approximation. Using this approach, we propose the first Accelerated Quasi-Newton method with a global convergence rate of O(k2)O(k^{-2}) for general convex functions. In special cases where we can improve the precision of the approximation, we achieve a global convergence rate of O(k3)O(k^{-3}), which is faster than any first-order method. To make these methods practical, we introduce the Adaptive Inexact Cubic Regularized Newton Method and its accelerated version, which provide real-time control of the approximation error. We show that the proposed methods have impressive practical performance and outperform both first and second-order methods

    Manual Optical Attitude Re-initialization of a Crew Vehicle in Space Using Bias Corrected Gyro Data

    Get PDF
    NASA and other space agencies have shown interest in sending humans on missions beyond low Earth orbit. Proposed is an algorithm that estimates the attitude of a manned spacecraft using measured line-of-sight (LOS) vectors to stars and gyroscope measurements. The Manual Optical Attitude Reinitialization (MOAR) algorithm and corresponding device draw inspiration from existing technology from the Gemini, Apollo and Space Shuttle programs. The improvement over these devices is the capability of estimating gyro bias completely independent from re-initializing attitude. It may be applied to the lost-in-space problem, where the spacecraft\u27s attitude is unknown.;In this work, a model was constructed that simulated gyro data using the Farrenkopf gyro model, and LOS measurements from a spotting scope were then computed from it. Using these simulated measurements, gyro bias was estimated by comparing measured interior star angles to those derived from a star catalog and then minimizing the difference using an optimization technique. Several optimization techniques were analyzed, and it was determined that the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm performed the best when combined with a grid search technique. Once estimated, the gyro bias was removed and attitude was determined by solving the Wahba Problem via the Singular Value Decomposition (SVD) approach. Several Monte Carlo simulations were performed that looked at different operating conditions for the MOAR algorithm. These included the effects of bias instability, using different constellations for data collection, sampling star measurements in different orders, and varying the time between measurements. A common method of estimating gyro bias and attitude in a Multiplicative Extended Kalman Filter (MEKF) was also explored and disproven for use in the MOAR algorithm.;A prototype was also constructed to validate the proposed concepts. It was built using a simple spotting scope, MEMS grade IMU, and a Raspberry Pi computer. It was mounted on a tripod, used to target stars with the scope and measure the rotation between them using the IMU. The raw measurements were then post-processed using the MOAR algorithm, and attitude estimates were determined. Two different constellations---the Big Dipper and Orion---were used for experimental data collection. The results suggest that the novel method of estimating gyro bias independently from attitude in this document is credible for use onboard a spacecraft

    Gradient-based quantum optimal control on superconducting qubit systems

    Get PDF
    Quantum technologies are expected to help solve many of today's global challenges, revolutionizing several fields such as computing, sensing and secure communications. In this regard, the need for precise manipulation of the dynamics of a quantum system and its optimization has given rise to the field of quantum control theory. In the search for optimal controls, accurate derivatives are a possible method to traverse and ultimately converge in quantum optimization landscapes. In this work we study an efficient algorithm for computing analytically-exact derivatives by formulating the control problem in the basis that diagonalizes the control Hamiltonian and applying a specific Trotterized time propagation scheme. The method is numerically verified for a system of superconducting transmon qubits in the few- and many body regime using matrix product states. The comparison between the results obtained using an exact dynamics via Krylov subspace methods shows how the approximate dynamics ultimately sets a trade-off between computational complexity and quality of the final solutions.Quantum technologies are expected to help solve many of today's global challenges, revolutionizing several fields such as computing, sensing and secure communications. In this regard, the need for precise manipulation of the dynamics of a quantum system and its optimization has given rise to the field of quantum control theory. In the search for optimal controls, accurate derivatives are a possible method to traverse and ultimately converge in quantum optimization landscapes. In this work we study an efficient algorithm for computing analytically-exact derivatives by formulating the control problem in the basis that diagonalizes the control Hamiltonian and applying a specific Trotterized time propagation scheme. The method is numerically verified for a system of superconducting transmon qubits in the few- and many body regime using matrix product states. The comparison between the results obtained using an exact dynamics via Krylov subspace methods shows how the approximate dynamics ultimately sets a trade-off between computational complexity and quality of the final solutions

    Global optimization: techniques and applications

    Get PDF
    Optimization problems arise in a wide variety of scientific disciplines. In many practical problems, a global optimum is desired, yet the objective function has multiple local optima. A number of techniques aimed at solving the global optimization problem have emerged in the last 30 years of research. This thesis first reviews techniques for local optimization and then discusses many of the stochastic and deterministic methods for global optimization that are in use today. Finally, this thesis shows how to apply global optimization techniques to two practical problems: the image segmentation problem (from imaging science) and the 3-D registration problem (from computer vision)

    ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

    Full text link
    We introduce ADAHESSIAN, a second order stochastic optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates of the HESSIAN. Second order algorithms are among the most powerful optimization algorithms with superior convergence properties as compared to first order methods such as SGD and Adam. The main disadvantage of traditional second order methods is their heavier per iteration computation and poor accuracy as compared to first order methods. To address these, we incorporate several novel approaches in ADAHESSIAN, including: (i) a fast Hutchinson based method to approximate the curvature matrix with low computational overhead; (ii) a root-mean-square exponential moving average to smooth out variations of the Hessian diagonal across different iterations; and (iii) a block diagonal averaging to reduce the variance of Hessian diagonal elements. We show that ADAHESSIAN achieves new state-of-the-art results by a large margin as compared to other adaptive optimization methods, including variants of Adam. In particular, we perform extensive tests on CV, NLP, and recommendation system tasks and find that ADAHESSIAN: (i) achieves 1.80%/1.45% higher accuracy on ResNets20/32 on Cifar10, and 5.55% higher accuracy on ImageNet as compared to Adam; (ii) outperforms AdamW for transformers by 0.13/0.33 BLEU score on IWSLT14/WMT14 and 2.7/1.0 PPL on PTB/Wikitext-103; (iii) outperforms AdamW for SqueezeBert by 0.41 points on GLUE; and (iv) achieves 0.032% better score than Adagrad for DLRM on the Criteo Ad Kaggle dataset. Importantly, we show that the cost per iteration of ADAHESSIAN is comparable to first order methods, and that it exhibits robustness towards its hyperparameters

    Adaptive sampling trust-region methods for derivative-based and derivative-free simulation optimization problems

    Get PDF
    We consider unconstrained optimization problems where only “stochastic” estimates of the objective function are observable as replicates from a Monte Carlo simulation oracle. In the first study we assume that the function gradients are directly observable through the Monte Carlo simulation. We propose ASTRO, which is an adaptive sampling based trust-region optimization method where a stochastic local model is constructed, optimized, and updated iteratively. ASTRO is a derivative-based algorithm and provides almost sure convergence to a first-order critical point with good practical performance. In the second study the Monte Carlo simulation is assumed to provide no direct observations of the function gradient. We present ASTRO-DF, which is a class of derivative-free trust-region algorithms, where the stochastic local model is obtained through interpolation. Function estimation (as well as gradient estimation) and model construction within ASTRO and ASTRO-DF are adaptive in the sense that the extent of Monte Carlo sampling is determined by continuously monitoring and balancing metrics of sampling and structural errors within ASTRO and ASTRO-DF. Such error balancing is designed to ensure that the Monte Carlo effort within ASTRO and ASTRO-DF is sensitive to algorithm trajectory, sampling more whenever an iterate is inferred to be close to a critical point and less when far away. We demonstrate the almost-sure convergence of ASTRO-DF\u27s iterates to a first-order critical point when using quadratic stochastic interpolation models. The question of using more complicated models, e.g., regression or stochastic kriging, in combination with adaptive sampling is worth further investigation and will benefit from the methods of proof we present. We investigate the implementation of ASTRO and ASTRO-DF along with the heuristics that enhance the implementation of ASTRO-DF, and report their finite-time performance on a series of low-to-moderate dimensional problems in the CUTEr framework. We speculate that the iterates of both ASTRO and ASTRO-DF achieve the canonical Monte Carlo convergence rate, although a proof remains elusive

    Embedded and validated control algorithms for the spacecraft rendezvous

    Get PDF
    L'autonomie est l'une des préoccupations majeures lors du développement de missions spatiales que l'objectif soit scientifique (exploration interplanétaire, observations, etc) ou commercial (service en orbite). Pour le rendez-vous spatial, cette autonomie dépend de la capacité embarquée de contrôle du mouvement relatif entre deux véhicules spatiaux. Dans le contexte du service aux satellites (dépannage, remplissage additionnel d'ergols, correction d'orbite, désorbitation en fin de vie, etc), la faisabilité de telles missions est aussi fortement liée à la capacité des algorithmes de guidage et contrôle à prendre en compte l'ensemble des contraintes opérationnelles (par exemple, saturation des propulseurs ou restrictions sur le positionnement relatif entre les véhicules) tout en maximisant la durée de vie du véhicule (minimisation de la consommation d'ergols). La littérature montre que ce problème a été étudié intensément depuis le début des années 2000. Les algorithmes proposés ne sont pas tout à fait satisfaisants. Quelques approches, par exemple, dégradent les contraintes afin de pouvoir fonder l'algorithme de contrôle sur un problème d'optimisation efficace. D'autres méthodes, si elles prennent en compte l'ensemble du problème, se montrent trop lourdes pour être embarquées sur de véritables calculateurs existants dans les vaisseaux spatiaux. Le principal objectif de cette thèse est le développement de nouveaux algorithmes efficaces et validés pour le guidage et le contrôle impulsif des engins spatiaux dans le contexte des phases dites de "hovering" du rendez-vous orbital, i.e. les étapes dans lesquelles un vaisseau secondaire doit maintenir sa position à l'intérieur d'une zone délimitée de l'espace relativement à un autre vaisseau principal. La première contribution présentée dans ce manuscrit utilise une nouvelle formulation mathématique des contraintes d'espace pour le mouvement relatif entre vaisseaux spatiaux pour la conception d'algorithmes de contrôle ayant un traitement calculatoire plus efficace comparativement aux approches traditionnelles. La deuxième et principale contribution est une stratégie de contrôle prédictif qui assure la convergence des trajectoires relatives vers la zone de "hovering", même en présence de perturbations ou de saturation des actionneurs. Un travail spécifique de développement informatique a pu démontrerl'embarquabilité de ces algorithmes de contrôle sur une carte contenant un microprocesseur LEON3 synthétisé sur FPGA certifié pour le vol spatial, reproduisant les performances des dispositifs habituellement utilisés en vol. Finalement, des outils d'approximation rigoureuse de fonctions ont été utilisés pour l'obtention des solutions validées des équations décrivant le mouvement relatif linéarisé, permettant ainsi une propagation certifiée simple des trajectoires relatives via des polynômes et la vérification du respect des contraintes du problème.Autonomy is one of the major concerns during the planning of a space mission, whether its objective is scientific (interplanetary exploration, observations, etc.) or commercial (service in orbit). For space rendezvous, this autonomy depends on the on-board capacity of controlling the relative movement between two spacecraft. In the context of satellite servicing (troubleshooting, propellant refueling, orbit correction, end-of-life deorbit, etc.), the feasibility of such missions is also strongly linked to the ability of the guidance and control algorithms to account for all operational constraints (for example, thruster saturation or restrictions on the relative positioning between the vehicles) while maximizing the life of the vehicle (minimizing propellant consumption). The literature shows that this problem has been intensively studied since the early 2000s. However, the proposed algorithms are not entirely satisfactory. Some approaches, for example, degrade the constraints in order to be able to base the control algorithm on an efficient optimization problem. Other methods accounting for the whole set of constraints of the problem are too cumbersome to be embedded on real computers existing in the spaceships. The main object of this thesis is the development of new efficient and validated algorithms for the impulsive guidance and control of spacecraft in the context of the so-called "hovering" phases of the orbital rendezvous, i.e. the stages in which a secondary vessel must maintain its position within a bounded area of space relatively to another main vessel. The first contribution presented in this manuscript uses a new mathematical formulation of the space constraints for the relative motion between spacecraft for the design of control algorithms with more efficient computational processing compared to traditional approaches. The second and main contribution is a predictive control strategy that has been formally demonstrated to ensure the convergence of relative trajectories towards the "hovering" zone, even in the presence of disturbances or saturation of the actuators. Specific computational developments have demonstrated the embeddability of these control algorithms on a board containing a FPGA-synthesized LEON3 microprocessor certified for space flight, reproducing the performance of the devices usually used in flight. Finally, tools for rigorous approximation of functions were used to obtain validated solutions of the equations describing the linearized relative motion, allowing a simple certified propagation of the relative trajectories via polynomials and the verification of the respect of the constraints of the problem

    Nonlinear Preconditioning Methods for Optimization and Parallel-In-Time Methods for 1D Scalar Hyperbolic Partial Differential Equations

    Get PDF
    This thesis consists of two main parts, part one addressing problems from nonlinear optimization and part two based on solving systems of time dependent differential equations, with both parts describing strategies for accelerating the convergence of iterative methods. In part one we present a nonlinear preconditioning framework for use with nonlinear solvers applied to nonlinear optimization problems, motivated by a generalization of linear left preconditioning and linear preconditioning via a change of variables for minimizing quadratic objective functions. In the optimization context nonlinear preconditioning is used to generate a preconditioner direction that either replaces or supplements the gradient vector throughout the optimization algorithm. This framework is used to discuss previously developed nonlinearly preconditioned nonlinear GMRES and nonlinear conjugate gradients (NCG) algorithms, as well as to develop two new nonlinearly preconditioned quasi-Newton methods based on the limited memory Broyden and limited memory BFGS (L-BFGS) updates. We show how all of the above methods can be implemented in a manifold optimization context, with a particular emphasis on Grassmann matrix manifolds. These methods are compared by solving the optimization problems defining the canonical polyadic (CP) decomposition and Tucker higher order singular value decomposition (HOSVD) for tensors, which are formulated as minimizing approximation error in the Frobenius norm. Both of these decompositions have alternating least squares (ALS) type fixed point iterations derived from their optimization problem definitions. While these ALS type iterations may be slow to converge in practice, they can serve as efficient nonlinear preconditioners for the other optimization methods. As the Tucker HOSVD problem involves orthonormality constraints and lacks unique minimizers, the optimization algorithms are extended from Euclidean space to the manifold setting, where optimization on Grassmann manifolds can resolve both of the issues present in the HOSVD problem. The nonlinearly preconditioned methods are compared to the ALS type preconditioners and non-preconditioned NCG, L-BFGS, and a trust region algorithm using both synthetic and real life tensor data with varying noise level, the real data arising from applications in computer vision and handwritten digit recognition. Numerical results show that the nonlinearly preconditioned methods offer substantial improvements in terms of time-to-solution and robustness over state-of-the-art methods for large tensors, in cases where there are significant amounts of noise in the data, and when high accuracy results are required. In part two we apply a multigrid reduction-in-time (MGRIT) algorithm to scalar one-dimensional hyperbolic partial differential equations. This study is motivated by the observation that sequential time-stepping is an obvious computational bottleneck when attempting to implement highly concurrent algorithms, thus parallel-in-time methods are particularly desirable. Existing parallel-in-time methods have produced significant speedups for parabolic or sufficiently diffusive problems, but can have stability and convergence issues for hyperbolic or advection dominated problems. Being a multigrid method, MGRIT primarily uses temporal coarsening, but spatial coarsening can also be incorporated to produce cheaper multigrid cycles and to ensure stability conditions are satisfied on all levels for explicit time-stepping methods. We compare convergence results for the linear advection and diffusion equations, which illustrate the increased difficulty associated with solving hyperbolic problems via parallel-in-time methods. A particular issue that we address is the fact that uniform factor-two spatial coarsening may negatively affect the convergence rate for MGRIT, resulting in extremely slow convergence when the wave speed is near zero, even if only locally. This is due to a sort of anisotropy in the nodal connections, with small wave speeds resulting in spatial connections being weaker than temporal connections. Through the use of semi-algebraic mode analysis applied to the combined advection-diffusion equation we illustrate how the norm of the iteration matrix, and hence an upper bound on the rate of convergence, varies for different choices of wave speed, diffusivity coefficient, space-time grid spacing, and the inclusion or exclusion of spatial coarsening. The use of waveform relaxation multigrid on intermediate, temporally semi-coarsened grids is identified as a potential remedy for the issues introduced by spatial coarsening, with the downside of creating a more intrusive algorithm that cannot be easily combined with existing time-stepping routines for different problems. As a second, less intrusive, alternative we present an adaptive spatial coarsening strategy that prevents the slowdown observed for small local wave speeds, which is applicable for solving the variable coefficient linear advection equation and the inviscid Burgers equation using first-order explicit or implicit time-stepping methods. Serial numerical results show this method offers significant improvements over uniform coarsening and is convergent for inviscid Burgers' equation with and without shocks. Parallel scaling tests indicate that improvements over serial time-stepping strategies are possible when spatial parallelism alone saturates, and that scalability is robust for oscillatory solutions that change on the scale of the grid spacing

    Detecting a Z2Z_2 topologically ordered phase from unbiased infinite projected entangled-pair state simulations

    Get PDF
    We present an approach to identify topological order based on unbiased infinite projected entangled-pair states (iPEPS) simulations, i.e. where we do not impose a virtual symmetry on the tensors during the optimization of the tensor network ansatz. As an example we consider the ground state of the toric code model in a magnetic field exhibiting Z2Z_2 topological order. The optimization is done by an efficient energy minimization approach based on a summation of tensor environments to compute the gradient. We show that the optimized tensors, when brought into the right gauge, are approximately Z2Z_2 symmetric, and they can be fully symmetrized a posteriori to generate a stable topologically ordered state, yielding the correct topological entanglement entropy and modular S and U matrices. To compute the latter we develop a variant of the corner-transfer matrix method which is computationally more efficient than previous approaches based on the tensor renormalization group.Comment: 16 pages, 14 figure
    corecore