3,112 research outputs found
Lyapunov design of a simple step-size adaptation strategy based on success
A simple success-based step-size adaptation rule for singleparent Evolution Strategies is formulated, and the setting of the corresponding parameters is considered. Theoretical convergence on the class of strictly unimodal functions of one variable that are symmetric around the optimum is investigated using a stochastic Lyapunov function method developed by Semenov and Terkel [5] in the context of martingale theory. General expressions for the conditional expectations of the next values of step size and distance to the optimum under (1 +, λ)-selection are analytically derived, and an appropriate Lyapunov function is constructed. Convergence rate upper bounds, as well as adaptation parameter values, are obtained through numerical optimization for increasing values of λ. By selecting the number of offspring that minimizes the bound on the convergence rate with respect to the number of function evaluations, all strategy parameter values result from the analysis
Recommended from our members
Understanding Model-Based Reinforcement Learning and its Application in Safe Reinforcement Learning
Model-based reinforcement learning algorithms have been shown to achieve successful results on various continuous control benchmarks, but the understanding of model-based methods is limited. We try to interpret how model-based method works through novel experiments on state-of-the-art algorithms with an emphasis on the model learning part. We evaluate the role of the model learning in policy optimization and propose methods to learn a more accurate model. With a better understanding of model-based reinforcement learning, we then apply model-based methods to solve safe reinforcement learning (RL) problems with near-zero violation of hard constraints throughout training. Drawing an analogy with how humans and animals learn to perform safe actions, we break down the safe RL problem into three stages. First, we train agents in a constraint-free environment to learn a performant policy for reaching high rewards, and simultaneously learn a model of the dynamics. Second, we use model-based methods to plan safe actions and train a safeguarding policy from these actions through imitation. Finally, we propose a factored framework to train an overall policy that mixes the performant policy and the safeguarding policy. This three-step curriculum ensures near-zero violation of safety constraints at all times. As an advantage of model-based method, the sample complexity required at the second and third steps of the process is significantly lower than model-free methods and can enable online safe learning. We demonstrate the effectiveness of our methods in various continuous control problems and analyze the advantages over state-of-the-art approaches
Recommended from our members
Event-triggered coordination for formation tracking control in constrained space with limited communication
In this paper, the formation tracking control is studied for a multi-agent system (MAS) with communication limitations. The objective is to control a group of agents to track a desired trajectory while maintaining a given formation in non omniscient constrained space. The role switching triggered by the detection of unexpected spatial constraints facilitates efficiency of event-triggered control in communication bandwidth, energy consumption and processor usage. A coordination mechanism is proposed based on a novel role ‘coordinator’ to indirectly spread environmental information among the whole communication network and form a feedback link from followers to the leader to guarantee the formation keeping. A formation scaling factor is introduced to scale up or scale down the given formation size in the case that the region is impassable for MAS with the original formation size. Controllers for the leader and followers are designed and the adaptation law is developed for the formation scaling factor. The conditions for asymptotic stability of MAS are discussed based on the Lyapunov theory. Simulation results are presented to illustrate the performance of proposed approaches
A Survey on Delay-Aware Resource Control for Wireless Systems --- Large Deviation Theory, Stochastic Lyapunov Drift and Distributed Stochastic Learning
In this tutorial paper, a comprehensive survey is given on several major
systematic approaches in dealing with delay-aware control problems, namely the
equivalent rate constraint approach, the Lyapunov stability drift approach and
the approximate Markov Decision Process (MDP) approach using stochastic
learning. These approaches essentially embrace most of the existing literature
regarding delay-aware resource control in wireless systems. They have their
relative pros and cons in terms of performance, complexity and implementation
issues. For each of the approaches, the problem setup, the general solution and
the design methodology are discussed. Applications of these approaches to
delay-aware resource allocation are illustrated with examples in single-hop
wireless networks. Furthermore, recent results regarding delay-aware multi-hop
routing designs in general multi-hop networks are elaborated. Finally, the
delay performance of the various approaches are compared through simulations
using an example of the uplink OFDMA systems.Comment: 58 pages, 8 figures; IEEE Transactions on Information Theory, 201
- …