Search CORE

175,805 research outputs found

Beyond dynamic programming

Author: Muraleedharan Abhinav
Publication venue
Publication date: 26/06/2023
Field of study

In this paper, we present Score-life programming, a novel theoretical approach for solving reinforcement learning problems. In contrast with classical dynamic programming-based methods, our method can search over non-stationary policy functions, and can directly compute optimal infinite horizon action sequences from a given state. The central idea in our method is the construction of a mapping between infinite horizon action sequences and real numbers in a bounded interval. This construction enables us to formulate an optimization problem for directly computing optimal infinite horizon action sequences, without requiring a policy function. We demonstrate the effectiveness of our approach by applying it to nonlinear optimal control problems. Overall, our contributions provide a novel theoretical framework for formulating and solving reinforcement learning problems.Comment: 17 pages. Colab Notebook: https://colab.research.google.com/drive/1GKIMieKrYLX_YXnUOFuEvHwk8CH26zVu?usp=sharing github repo/code: https://github.com/Abhinav-Muraleedharan/Beyond_Dynamic_Programming.gi

arXiv.org e-Print Archive

A computational method for solving time-delay optimal control problems with free terminal time

Author: Liu C.
Loxton Ryan
Teo Kok Lay
Publication venue: 'Elsevier BV'
Publication date: 01/01/2014
Field of study

This paper considers a class of optimal control problems for general nonlinear time-delay systems with free terminal time. We first show that for this class of problems, the well-known time-scaling transformation for mapping the free time horizon into a fixed time interval yields a new time-delay system in which the time delays are variable. Then, we introduce a control parameterization scheme to approximate the control variables in the new system by piecewise-constant functions. This yields an approximate finite-dimensional optimization problem with three types of decision variables: the control heights, the control switching times, and the terminal time in the original system (which influences the variable time delays in the new system). We develop a gradient-based optimization approach for solving this approximate problem. Simulation results are also provided to demonstrate the effectiveness of the proposed approach

espace@Curtin

On The Karush – Kuhn – Tucker Reformulation of Bi – Level Geometric Programming Problem with an Interval Coefficients as Multiple Parameters

Author: Amer Azza Hassan
Publication venue: Scitech Research Organisation
Publication date: 23/02/2017
Field of study

This paper presents a new approach to solve a special class of bi – level nonlinear programming (NLP) problems with an interval coefficients as multiple parameters. Geometric programming (GP) is a powerful technique developed for solving nonlinear programming (NLP) problems and it is useful in the study of a variety of optimization problems. Many applications of GP in various fields of science and engineering are used to solve certain complex decision making problems. In this paper a new mathematical formulations for a new class of nonlinear optimization models called bi – level geometric programming (BLGP) problem is presented. This problems are not necessarily convex and thus not solvable by standard nonlinear programming techniques. This paper proposed a method to solve BLGP problem where coefficient of objective function as well as coeffiaent of constraints are multiple parameters. Especially the multiple parameters are considered in an interval which are the Arithmetic mean (A.M), Geometric mean (G.M) and Harmonic mean (H. M) of the end points of the interval. In this paper, the values of objective function in interval range of parameters for A. M., G. M. and H. M. are preserved the same relationship. Also, BLGP problem can be converted to a single objective by using the classical karush – kuhn – Tucker (KKT) reformulation and the ability of calculating the bounds of objective value in KKT is basically presented in this paper that may help researchers in constructing more realistic model in optimization field.  Finally, numerical example is given to illustrate the efficiency of the method

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Scitech Research Journals

Special Bilevel Quadratic Problems for Construction of Worst-Case Feedback Control in Linear-Quadratic Optimal Control Problems under Uncertainties

Author: Stibbe Hilke Isabell
Publication venue: Philipps-Universität Marburg
Publication date: 01/01/2019
Field of study

Almost all mathematical models that describe processes, for instance in industry, engineering or natural sciences, contain uncertainties which arise from different sources. We have to take these uncertainties into account when solving optimal control problems for such processes. There are two popular approaches : On the one hand the so-called closed-loop feedback controls, where the nominal optimal control is updated as soon as the actual state and parameter estimates of the process are available and on the other hand robust optimization, for example worst-case optimization, where it is searched for an optimal solution that is good for all possible realizations of uncertain parameters. For the optimal control problems of dynamic processes with unknown but bounded uncertainties we are interested in a combination of feedback controls and robust optimization. The computation of such a closed-loop worst-case feedback optimal control is rather difficult because of high dimensional complexity and it is often too expensive or too slow for complex optimal control problems, especially for solving problems in real-time. Another difficulty is that the process trajectory corresponding to the worst-case optimal control might be infeasible. That is why we suggest to solve the problems successively by dividing the time interval and determining intermediate time points, computing the feedback controls of the smaller intervals and allowing to correct controls at these fixed intermediate time points. With this approach we can guarantee that for all admissible uncertainties the terminal state lies in a given prescribed neighborhood of a given state at a given final moment. We can also guarantee that the value of the cost function does not exceed a given estimate. In this thesis we introduce special bilevel programming problems with solutions of which we may construct the feedback controls. These bilevel problems can be solved explicitly. We present, based on these bilevel problems, efficient methods and approximations for different control policies for the combination of feedback control and robust optimization methods which can be implemented online, compare these approaches and show their application on linear-quadratic control problems

Publikations- und Dokumentenserver der Universitätsbibliothek Marburg

Limited Memory Steepest Descent Methods for Nonlinear Optimization

Author: Guo Wei
Publication venue: Lehigh Preserve
Publication date
Field of study

This dissertation concerns the development of limited memory steepest descent (LMSD) methods for solving unconstrained nonlinear optimization problems. In particular, we focus on the class of LMSD methods recently proposed by Fletcher, which he has shown to be competitive with well-known quasi-Newton methods such as L-BFGS. However, in the design of such methods, much work remains to be done. First of all, Fletcher only showed a convergence result for LMSD methods when minimizing strongly convex quadratics, but no convergence rate result. In addition, his method mainly focused on minimizing strongly convex quadratics and general convex objectives, while when it comes to nonconvex objectives, open questions remain about how to effectively deal with nonpositive curvature. Furthermore, Fletcher\u27s method relies on having access to exact gradients, which can be a limitation when computing exact gradients is too expensive. The focus of this dissertation is the design and analysis of algorithms intended to solve these issues.In the first part of the new results in this dissertation, a convergence rate result for an LMSD method is proved. For context, we note that a basic LMSD method is an extension of the Barzilai-Borwein ``two-point stepsize\u27\u27 strategy for steepest descent methods for solving unconstrained optimization problems. It is known that the Barzilai-Borwein strategy yields a method with an R-linear rate of convergence when it is employed to minimize a strongly convex quadratic. Our contribution is to extend this analysis for LMSD, also for strongly convex quadratics. In particular, it is shown that, under reasonable assumptions, the method is R-linearly convergent for any choice of the history length parameter. The results of numerical experiments are also provided to illustrate behaviors of the method that are revealed through the theoretical analysis.The second part proposes an LMSD method for solving unconstrained nonconvex optimization problems. As a steepest descent method, the step computation in each iteration only requires the evaluation of a gradient of the objective function and the calculation of a scalar stepsize. When employed to solve certain convex problems, our method reduces to a variant of LMSD method proposed by Fletcher, which means that, when the history length parameter is set to one, it reduces to a steepest descent method inspired by that proposed by Barzilai and Borwein. However, our method is novel in that we propose new algorithmic features for cases when nonpositive curvature is encountered. That is, our method is particularly suited for solving nonconvex problems. With a nonmonotone line search, we ensure global convergence for a variant of our method. We also illustrate with numerical experiments that our approach often yields superior performance when employed to solve nonconvex problems.In the third part, we propose a limited memory stochastic gradient (LMSG) method for solving optimization problems arising in machine learning. As a start, we focus on problems that are strongly convex. When the dataset is too large such that the computation of full gradients is too expensive, our method computes stepsizes and iterates based on (mini-batch) stochastic gradients. Although in stochastic gradient (SG) methods, a best-tuned fixed stepsize or diminishing stepsize is most widely used, it can be inefficient in practice. Our method adopts a cubic model and always guarantees a positive meaningful stepsize, even when nonpositive curvature is encountered (which can happen when using stochastic gradients, even when the problem is convex). Our approach is based on the LMSD method with cubic regularization proposed in the second part of this dissertation. With a projection of stepsizes, we ensure convergence to a neighborhood of the optimal solution when the interval is fixed and convergence to the optimal solution when the interval is diminishing. We also illustrate with numerical experiments that our approach can outperform an SG method with a fixed stepsize

Lehigh University: Lehigh Preserve

A Distributed Linear Quadratic Discrete-Time Game Approach to Formation Control with Collision Avoidance

Author: Aditya Prima
Werner Herbert
Publication venue
Publication date: 05/09/2023
Field of study

Formation control problems can be expressed as linear quadratic discrete-time games (LQDTG) for which Nash equilibrium solutions are sought. However, solving such problems requires solving coupled Riccati equations, which cannot be done in a distributed manner. A recent study showed that a distributed implementation is possible for a consensus problem when fictitious agents are associated with edges in the network graph rather than nodes. This paper proposes an extension of this approach to formation control with collision avoidance, where collision is precluded by including appropriate penalty terms on the edges. To address the problem, a state-dependent Riccati equation needs to be solved since the collision avoidance term in the cost function leads to a state-dependent weight matrix. This solution provides relative control inputs associated with the edges of the network graph. These relative inputs then need to be mapped to the physical control inputs applied at the nodes; this can be done in a distributed manner by iterating over a gradient descent search between neighbors in each sampling interval. Unlike inter-sample iteration frequently used in distributed MPC, only a matrix-vector multiplication is needed for each iteration step here, instead of an optimization problem to be solved. This approach can be implemented in a receding horizon manner, this is demonstrated through a numerical example

arXiv.org e-Print Archive