85 research outputs found
Cubic Regularization is the Key! The First Accelerated Quasi-Newton Method with a Global Convergence Rate of for Convex Functions
In this paper, we propose the first Quasi-Newton method with a global
convergence rate of for general convex functions. Quasi-Newton
methods, such as BFGS, SR-1, are well-known for their impressive practical
performance. However, they may be slower than gradient descent for general
convex functions, with the best theoretical rate of . This gap
between impressive practical performance and poor theoretical guarantees was an
open question for a long period of time. In this paper, we make a significant
step to close this gap. We improve upon the existing rate and propose the Cubic
Regularized Quasi-Newton Method with a convergence rate of . The key
to achieving this improvement is to use the Cubic Regularized Newton Method
over the Damped Newton Method as an outer method, where the Quasi-Newton update
is an inexact Hessian approximation. Using this approach, we propose the first
Accelerated Quasi-Newton method with a global convergence rate of
for general convex functions. In special cases where we can improve the
precision of the approximation, we achieve a global convergence rate of
, which is faster than any first-order method. To make these methods
practical, we introduce the Adaptive Inexact Cubic Regularized Newton Method
and its accelerated version, which provide real-time control of the
approximation error. We show that the proposed methods have impressive
practical performance and outperform both first and second-order methods
Manual Optical Attitude Re-initialization of a Crew Vehicle in Space Using Bias Corrected Gyro Data
NASA and other space agencies have shown interest in sending humans on missions beyond low Earth orbit. Proposed is an algorithm that estimates the attitude of a manned spacecraft using measured line-of-sight (LOS) vectors to stars and gyroscope measurements. The Manual Optical Attitude Reinitialization (MOAR) algorithm and corresponding device draw inspiration from existing technology from the Gemini, Apollo and Space Shuttle programs. The improvement over these devices is the capability of estimating gyro bias completely independent from re-initializing attitude. It may be applied to the lost-in-space problem, where the spacecraft\u27s attitude is unknown.;In this work, a model was constructed that simulated gyro data using the Farrenkopf gyro model, and LOS measurements from a spotting scope were then computed from it. Using these simulated measurements, gyro bias was estimated by comparing measured interior star angles to those derived from a star catalog and then minimizing the difference using an optimization technique. Several optimization techniques were analyzed, and it was determined that the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm performed the best when combined with a grid search technique. Once estimated, the gyro bias was removed and attitude was determined by solving the Wahba Problem via the Singular Value Decomposition (SVD) approach. Several Monte Carlo simulations were performed that looked at different operating conditions for the MOAR algorithm. These included the effects of bias instability, using different constellations for data collection, sampling star measurements in different orders, and varying the time between measurements. A common method of estimating gyro bias and attitude in a Multiplicative Extended Kalman Filter (MEKF) was also explored and disproven for use in the MOAR algorithm.;A prototype was also constructed to validate the proposed concepts. It was built using a simple spotting scope, MEMS grade IMU, and a Raspberry Pi computer. It was mounted on a tripod, used to target stars with the scope and measure the rotation between them using the IMU. The raw measurements were then post-processed using the MOAR algorithm, and attitude estimates were determined. Two different constellations---the Big Dipper and Orion---were used for experimental data collection. The results suggest that the novel method of estimating gyro bias independently from attitude in this document is credible for use onboard a spacecraft
Gradient-based quantum optimal control on superconducting qubit systems
Quantum technologies are expected to help solve many of today's global challenges, revolutionizing several fields such as computing, sensing and secure communications. In this regard, the need for precise manipulation of the dynamics of a quantum system and its optimization has given rise to the field of quantum control theory. In the search for optimal controls, accurate derivatives are a possible method to traverse and ultimately converge in quantum optimization landscapes.
In this work we study an efficient algorithm for computing analytically-exact derivatives by formulating the control problem in the basis that diagonalizes the control Hamiltonian and applying a specific Trotterized time propagation scheme. The method is numerically verified for a system of superconducting transmon qubits in the few- and many body regime using matrix product states. The comparison between the results obtained using an exact dynamics via Krylov subspace methods shows how the approximate dynamics ultimately sets a trade-off between computational complexity and quality of the final solutions.Quantum technologies are expected to help solve many of today's global challenges, revolutionizing several fields such as computing, sensing and secure communications. In this regard, the need for precise manipulation of the dynamics of a quantum system and its optimization has given rise to the field of quantum control theory. In the search for optimal controls, accurate derivatives are a possible method to traverse and ultimately converge in quantum optimization landscapes.
In this work we study an efficient algorithm for computing analytically-exact derivatives by formulating the control problem in the basis that diagonalizes the control Hamiltonian and applying a specific Trotterized time propagation scheme. The method is numerically verified for a system of superconducting transmon qubits in the few- and many body regime using matrix product states. The comparison between the results obtained using an exact dynamics via Krylov subspace methods shows how the approximate dynamics ultimately sets a trade-off between computational complexity and quality of the final solutions
Global optimization: techniques and applications
Optimization problems arise in a wide variety of scientific disciplines. In many practical problems, a global optimum is desired, yet the objective function has multiple local optima. A number of techniques aimed at solving the global optimization problem have emerged in the last 30 years of research. This thesis first reviews techniques for local optimization and then discusses many of the stochastic and deterministic methods for global optimization that are in use today. Finally, this thesis shows how to apply global optimization techniques to two practical problems: the image segmentation problem (from imaging science) and the 3-D registration problem (from computer vision)
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
We introduce ADAHESSIAN, a second order stochastic optimization algorithm
which dynamically incorporates the curvature of the loss function via ADAptive
estimates of the HESSIAN. Second order algorithms are among the most powerful
optimization algorithms with superior convergence properties as compared to
first order methods such as SGD and Adam. The main disadvantage of traditional
second order methods is their heavier per iteration computation and poor
accuracy as compared to first order methods. To address these, we incorporate
several novel approaches in ADAHESSIAN, including: (i) a fast Hutchinson based
method to approximate the curvature matrix with low computational overhead;
(ii) a root-mean-square exponential moving average to smooth out variations of
the Hessian diagonal across different iterations; and (iii) a block diagonal
averaging to reduce the variance of Hessian diagonal elements. We show that
ADAHESSIAN achieves new state-of-the-art results by a large margin as compared
to other adaptive optimization methods, including variants of Adam. In
particular, we perform extensive tests on CV, NLP, and recommendation system
tasks and find that ADAHESSIAN: (i) achieves 1.80%/1.45% higher accuracy on
ResNets20/32 on Cifar10, and 5.55% higher accuracy on ImageNet as compared to
Adam; (ii) outperforms AdamW for transformers by 0.13/0.33 BLEU score on
IWSLT14/WMT14 and 2.7/1.0 PPL on PTB/Wikitext-103; (iii) outperforms AdamW for
SqueezeBert by 0.41 points on GLUE; and (iv) achieves 0.032% better score than
Adagrad for DLRM on the Criteo Ad Kaggle dataset. Importantly, we show that the
cost per iteration of ADAHESSIAN is comparable to first order methods, and that
it exhibits robustness towards its hyperparameters
Adaptive sampling trust-region methods for derivative-based and derivative-free simulation optimization problems
We consider unconstrained optimization problems where only “stochastic” estimates of the objective function are observable as replicates from a Monte Carlo simulation oracle. In the first study we assume that the function gradients are directly observable through the Monte Carlo simulation. We propose ASTRO, which is an adaptive sampling based trust-region optimization method where a stochastic local model is constructed, optimized, and updated iteratively. ASTRO is a derivative-based algorithm and provides almost sure convergence to a first-order critical point with good practical performance. In the second study the Monte Carlo simulation is assumed to provide no direct observations of the function gradient. We present ASTRO-DF, which is a class of derivative-free trust-region algorithms, where the stochastic local model is obtained through interpolation. Function estimation (as well as gradient estimation) and model construction within ASTRO and ASTRO-DF are adaptive in the sense that the extent of Monte Carlo sampling is determined by continuously monitoring and balancing metrics of sampling and structural errors within ASTRO and ASTRO-DF. Such error balancing is designed to ensure that the Monte Carlo effort within ASTRO and ASTRO-DF is sensitive to algorithm trajectory, sampling more whenever an iterate is inferred to be close to a critical point and less when far away. We demonstrate the almost-sure convergence of ASTRO-DF\u27s iterates to a first-order critical point when using quadratic stochastic interpolation models. The question of using more complicated models, e.g., regression or stochastic kriging, in combination with adaptive sampling is worth further investigation and will benefit from the methods of proof we present. We investigate the implementation of ASTRO and ASTRO-DF along with the heuristics that enhance the implementation of ASTRO-DF, and report their finite-time performance on a series of low-to-moderate dimensional problems in the CUTEr framework. We speculate that the iterates of both ASTRO and ASTRO-DF achieve the canonical Monte Carlo convergence rate, although a proof remains elusive
Embedded and validated control algorithms for the spacecraft rendezvous
L'autonomie est l'une des préoccupations majeures lors du développement de missions spatiales
que l'objectif soit scientifique (exploration interplanétaire, observations, etc) ou commercial
(service en orbite). Pour le rendez-vous spatial, cette autonomie dépend de la capacité
embarquée de contrôle du mouvement relatif entre deux véhicules spatiaux. Dans le contexte
du service aux satellites (dépannage, remplissage additionnel d'ergols, correction d'orbite,
désorbitation en fin de vie, etc), la faisabilité de telles missions est aussi fortement liée à la
capacité des algorithmes de guidage et contrôle à prendre en compte l'ensemble des contraintes
opérationnelles (par exemple, saturation des propulseurs ou restrictions sur le positionnement
relatif entre les véhicules) tout en maximisant la durée de vie du véhicule (minimisation de
la consommation d'ergols). La littérature montre que ce problème a été étudié intensément
depuis le début des années 2000. Les algorithmes proposés ne sont pas tout à fait satisfaisants.
Quelques approches, par exemple, dégradent les contraintes afin de pouvoir fonder l'algorithme
de contrôle sur un problème d'optimisation efficace. D'autres méthodes, si elles prennent
en compte l'ensemble du problème, se montrent trop lourdes pour être embarquées sur de
véritables calculateurs existants dans les vaisseaux spatiaux.
Le principal objectif de cette thèse est le développement de nouveaux algorithmes efficaces
et validés pour le guidage et le contrôle impulsif des engins spatiaux dans le contexte des
phases dites de "hovering" du rendez-vous orbital, i.e. les étapes dans lesquelles un vaisseau
secondaire doit maintenir sa position à l'intérieur d'une zone délimitée de l'espace relativement
à un autre vaisseau principal. La première contribution présentée dans ce manuscrit utilise
une nouvelle formulation mathématique des contraintes d'espace pour le mouvement relatif
entre vaisseaux spatiaux pour la conception d'algorithmes de contrôle ayant un traitement
calculatoire plus efficace comparativement aux approches traditionnelles. La deuxième et
principale contribution est une stratégie de contrôle prédictif qui assure la convergence des
trajectoires relatives vers la zone de "hovering", même en présence de perturbations ou de saturation des actionneurs. Un travail spécifique de développement informatique a pu
démontrerl'embarquabilité de ces algorithmes de contrôle sur une carte contenant un microprocesseur LEON3 synthétisé sur FPGA certifié pour le vol spatial, reproduisant les performances des dispositifs habituellement utilisés en vol. Finalement, des outils d'approximation rigoureuse
de fonctions ont été utilisés pour l'obtention des solutions validées des équations décrivant le
mouvement relatif linéarisé, permettant ainsi une propagation certifiée simple des trajectoires
relatives via des polynômes et la vérification du respect des contraintes du problème.Autonomy is one of the major concerns during the planning of a space mission, whether its
objective is scientific (interplanetary exploration, observations, etc.) or commercial (service in
orbit). For space rendezvous, this autonomy depends on the on-board capacity of controlling
the relative movement between two spacecraft. In the context of satellite servicing (troubleshooting, propellant refueling, orbit correction, end-of-life deorbit, etc.), the
feasibility of such missions is also strongly linked to the ability of the guidance and control algorithms to account for all operational constraints (for example, thruster saturation or restrictions on the relative positioning between the vehicles) while maximizing the life of the vehicle (minimizing propellant consumption). The literature shows that this problem has been intensively studied since the early 2000s. However, the proposed algorithms are not entirely satisfactory. Some approaches, for example, degrade the constraints in order to be able to base the control
algorithm on an efficient optimization problem. Other methods accounting for the whole set of constraints of the problem are too cumbersome to be embedded on real computers existing in the spaceships. The main object of this thesis is the development of new efficient and validated algorithms
for the impulsive guidance and control of spacecraft in the context of the so-called "hovering" phases of the orbital rendezvous, i.e. the stages in which a secondary vessel must maintain its position within a bounded area of space relatively to another main vessel. The first contribution presented in this manuscript uses a new mathematical formulation of the space constraints for the relative motion between spacecraft for the design of control algorithms
with more efficient computational processing compared to traditional approaches. The second and main contribution is a predictive control strategy that has been formally demonstrated to ensure the convergence of relative trajectories towards the "hovering" zone, even in the presence of disturbances or saturation of the actuators. Specific computational developments have demonstrated the embeddability of these control algorithms on a board containing a FPGA-synthesized LEON3 microprocessor certified for space flight, reproducing the performance of the devices usually used in flight. Finally, tools for rigorous approximation of functions were used to obtain validated solutions of the equations describing the linearized relative motion, allowing a simple certified propagation of the relative trajectories via polynomials and the verification of the respect of the constraints of the problem
Nonlinear Preconditioning Methods for Optimization and Parallel-In-Time Methods for 1D Scalar Hyperbolic Partial Differential Equations
This thesis consists of two main parts, part one addressing problems from nonlinear optimization and part two based on solving systems of time dependent differential equations, with both parts describing strategies for accelerating the convergence of iterative methods.
In part one we present a nonlinear preconditioning framework for use with nonlinear solvers applied to nonlinear optimization problems, motivated by a generalization of linear left preconditioning and linear preconditioning via a change of variables for minimizing quadratic objective functions. In the optimization context nonlinear preconditioning is used to generate a preconditioner direction that either replaces or supplements the gradient vector throughout the optimization algorithm. This framework is used to discuss previously developed nonlinearly preconditioned nonlinear GMRES and nonlinear conjugate gradients (NCG) algorithms, as well as to develop two new nonlinearly preconditioned quasi-Newton methods based on the limited memory Broyden and limited memory BFGS (L-BFGS) updates. We show how all of the above methods can be implemented in a manifold optimization context, with a particular emphasis on Grassmann matrix manifolds.
These methods are compared by solving the optimization problems defining the canonical polyadic (CP) decomposition and Tucker higher order singular value decomposition (HOSVD) for tensors, which are formulated as minimizing approximation error in the Frobenius norm. Both of these decompositions have alternating least squares (ALS) type fixed point iterations derived from their optimization problem definitions. While these ALS type iterations may be slow to converge in practice, they can serve as efficient nonlinear preconditioners for the other optimization methods. As the Tucker HOSVD problem involves orthonormality constraints and lacks unique minimizers, the optimization algorithms are extended from Euclidean space to the manifold setting, where optimization on Grassmann manifolds can resolve both of the issues present in the HOSVD problem.
The nonlinearly preconditioned methods are compared to the ALS type preconditioners and non-preconditioned NCG, L-BFGS, and a trust region algorithm using both synthetic and real life tensor data with varying noise level, the real data arising from applications in computer vision and handwritten digit recognition. Numerical results show that the nonlinearly preconditioned methods offer substantial improvements in terms of time-to-solution and robustness over state-of-the-art methods for large tensors, in cases where there are significant amounts of noise in the data, and when high accuracy results are required.
In part two we apply a multigrid reduction-in-time (MGRIT) algorithm to scalar one-dimensional hyperbolic partial differential equations. This study is motivated by the observation that sequential time-stepping is an obvious computational bottleneck when attempting to implement highly concurrent algorithms, thus parallel-in-time methods are particularly desirable. Existing parallel-in-time methods have produced significant speedups for parabolic or sufficiently diffusive problems, but can have stability and convergence issues for hyperbolic or advection dominated problems. Being a multigrid method, MGRIT primarily uses temporal coarsening, but spatial coarsening can also be incorporated to produce cheaper multigrid cycles and to ensure stability conditions are satisfied on all levels for explicit time-stepping methods.
We compare convergence results for the linear advection and diffusion equations, which illustrate the increased difficulty associated with solving hyperbolic problems via parallel-in-time methods. A particular issue that we address is the fact that uniform factor-two spatial coarsening may negatively affect the convergence rate for MGRIT, resulting in extremely slow convergence when the wave speed is near zero, even if only locally. This is due to a sort of anisotropy in the nodal connections, with small wave speeds resulting in spatial connections being weaker than temporal connections. Through the use of semi-algebraic mode analysis applied to the combined advection-diffusion equation we illustrate how the norm of the iteration matrix, and hence an upper bound on the rate of convergence, varies for different choices of wave speed, diffusivity coefficient, space-time grid spacing, and the inclusion or exclusion of spatial coarsening.
The use of waveform relaxation multigrid on intermediate, temporally semi-coarsened grids is identified as a potential remedy for the issues introduced by spatial coarsening, with the downside of creating a more intrusive algorithm that cannot be easily combined with existing time-stepping routines for different problems. As a second, less intrusive, alternative we present an adaptive spatial coarsening strategy that prevents the slowdown observed for small local wave speeds, which is applicable for solving the variable coefficient linear advection equation and the inviscid Burgers equation using first-order explicit or implicit time-stepping methods. Serial numerical results show this method offers significant improvements over uniform coarsening and is convergent for inviscid Burgers' equation with and without shocks. Parallel scaling tests indicate that improvements over serial time-stepping strategies are possible when spatial parallelism alone saturates, and that scalability is robust for oscillatory solutions that change on the scale of the grid spacing
Detecting a topologically ordered phase from unbiased infinite projected entangled-pair state simulations
We present an approach to identify topological order based on unbiased
infinite projected entangled-pair states (iPEPS) simulations, i.e. where we do
not impose a virtual symmetry on the tensors during the optimization of the
tensor network ansatz. As an example we consider the ground state of the toric
code model in a magnetic field exhibiting topological order. The
optimization is done by an efficient energy minimization approach based on a
summation of tensor environments to compute the gradient. We show that the
optimized tensors, when brought into the right gauge, are approximately
symmetric, and they can be fully symmetrized a posteriori to generate a stable
topologically ordered state, yielding the correct topological entanglement
entropy and modular S and U matrices. To compute the latter we develop a
variant of the corner-transfer matrix method which is computationally more
efficient than previous approaches based on the tensor renormalization group.Comment: 16 pages, 14 figure
- …