1,219 research outputs found

    A survey on policy search algorithms for learning robot controllers in a handful of trials

    Get PDF
    Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data", we refer to this challenge as "micro-data reinforcement learning". We show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based policy search), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots (e.g., humanoids), designing generic priors, and optimizing the computing time.Comment: 21 pages, 3 figures, 4 algorithms, accepted at IEEE Transactions on Robotic

    Performance Improvement of Low-Cost Iterative Learning-Based Fuzzy Control Systems for Tower Crane Systems

    Get PDF
    This paper is dedicated to the memory of Prof. Ioan Dzitac, one of the fathers of this journal and its founding Editor-in-Chief till 2021. The paper addresses the performance improvement of three Single Input-Single Output (SISO) fuzzy control systems that control separately the positions of interest of tower crane systems, namely the cart position, the arm angular position and the payload position. Three separate low-cost SISO fuzzy controllers are employed in terms of first order discrete-time intelligent Proportional-Integral (PI) controllers with Takagi-Sugeno-Kang Proportional-Derivative (PD) fuzzy terms. Iterative Learning Control (ILC) system structures with PD learning functions are involved in the current iteration SISO ILC structures. Optimization problems are defined in order to tune the parameters of the learning functions. The objective functions are defined as the sums of squared control errors, and they are solved in the iteration domain using the recent metaheuristic Slime Mould Algorithm (SMA). The experimental results prove the performance improvement of the SISO control systems after ten iterations of SMA

    Adaptive and learning-based formation control of swarm robots

    Get PDF
    Autonomous aerial and wheeled mobile robots play a major role in tasks such as search and rescue, transportation, monitoring, and inspection. However, these operations are faced with a few open challenges including robust autonomy, and adaptive coordination based on the environment and operating conditions, particularly in swarm robots with limited communication and perception capabilities. Furthermore, the computational complexity increases exponentially with the number of robots in the swarm. This thesis examines two different aspects of the formation control problem. On the one hand, we investigate how formation could be performed by swarm robots with limited communication and perception (e.g., Crazyflie nano quadrotor). On the other hand, we explore human-swarm interaction (HSI) and different shared-control mechanisms between human and swarm robots (e.g., BristleBot) for artistic creation. In particular, we combine bio-inspired (i.e., flocking, foraging) techniques with learning-based control strategies (using artificial neural networks) for adaptive control of multi- robots. We first review how learning-based control and networked dynamical systems can be used to assign distributed and decentralized policies to individual robots such that the desired formation emerges from their collective behavior. We proceed by presenting a novel flocking control for UAV swarm using deep reinforcement learning. We formulate the flocking formation problem as a partially observable Markov decision process (POMDP), and consider a leader-follower configuration, where consensus among all UAVs is used to train a shared control policy, and each UAV performs actions based on the local information it collects. In addition, to avoid collision among UAVs and guarantee flocking and navigation, a reward function is added with the global flocking maintenance, mutual reward, and a collision penalty. We adapt deep deterministic policy gradient (DDPG) with centralized training and decentralized execution to obtain the flocking control policy using actor-critic networks and a global state space matrix. In the context of swarm robotics in arts, we investigate how the formation paradigm can serve as an interaction modality for artists to aesthetically utilize swarms. In particular, we explore particle swarm optimization (PSO) and random walk to control the communication between a team of robots with swarming behavior for musical creation

    Optimal control and approximations

    Get PDF

    Optimal control and approximations

    Get PDF

    A survey on policy search algorithms for learning robot controllers in a handful of trials

    Get PDF
    International audienceMost policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word “big-data,” we refer to this challenge as “micro-data reinforcement learning.” In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time

    Analyzing the Improvements of Energy Management Systems for Hybrid Electric Vehicles Using a Systematic Literature Review: How Far Are These Controls from Rule-Based Controls Used in Commercial Vehicles?

    Get PDF
    Featured Application This work is useful for researchers interested in the study of energy management systems for hybrid electric vehicles. In addition, it is interesting for institutions related to the market of this type of vehicle. The hybridization of vehicles is a viable step toward overcoming the challenge of the reduction of emissions related to road transport all over the world. To take advantage of the emission reduction potential of hybrid electric vehicles (HEVs), the appropriate design of their energy management systems (EMSs) to control the power flow between the engine and the battery is essential. This work presents a systematic literature review (SLR) of the more recent works that developed EMSs for HEVs. The review is carried out subject to the following idea: although the development of novel EMSs that seek the optimum performance of HEVs is booming, in the real world, HEVs continue to rely on well-known rule-based (RB) strategies. The contribution of this work is to present a quantitative comparison of the works selected. Since several studies do not provide results of their models against commercial RB strategies, it is proposed, as another contribution, to complete their results using simulations. From these results, it is concluded that the improvement of the analyzed EMSs ranges roughly between 5% and 10% with regard to commercial RB EMSs; in comparison to the optimum, the analyzed EMSs are nearer to the optimum than commercial RB EMSs

    New Energy Management Systems for Battery Electric Vehicles with Supercapacitor

    Get PDF
    Recently, the Battery Electric Vehicle (BEV) has been considered to be a proper candidate to terminate the problems associated with fuel-based vehicles. Therefore, the development and enhancement of the BEVs have lately formed an attractive field of study. One of the significant challenges to commercialize BEVs is to overcome the battery drawbacks that limit the BEV’s performance. One promising solution is to hybridize the BEV with a supercapacitor (SC) so that the battery is the primary source of energy meanwhile the SC handles sudden fluctuations in power demand. Obviously, to exploit the most benefits from this hybrid system, an intelligent Energy Management System (EMS) is required. In this thesis, different EMSs are developed: first, the Nonlinear Model Predictive Controller (NMPC) based on Newton Generalized Minimum Residual (Newton/GMRES) method. The NMPC effectively optimizes the power distribution between the battery and supercapacitor as a result of NMPC ability to handle multi-input, multi-output problems and utilize past information to predict future power demand. However, real-time application of the NMPC is challenging due to its huge computational cost. Therefore, Newton/GMRES, which is a fast real-time optimizer, is implemented in the heart of the NMPC. Simulation results demonstrate that the Newton/GMRES NMPC successfully protects the battery during high power peaks and nadirs. On the other hand, future power demand is inherently probabilistic. Consequently, Stochastic Dynamic Programming (SDP) is employed to maximize the battery lifespan while considering the uncertain nature of power demand. The next power demand is predicted by a Markov chain. The SDP approach determines the optimal policy using the policy iteration algorithm. Implementation of the SDP is quite free-to-launch since it does not require any additional equipment. Furthermore, the SDP is an offline approach, thus, computational cost is not an issue. Simulation results are considerable compared to those of other rival approaches. Recent success stories of applying bio-inspired techniques such as Particle Swarm Optimization (PSO) to control area have motivated the author to investigate the potential of this algorithm to solve the problem at hand. The PSO is a population-based method that effectively seeks the best answer in the solution space with no need to solve complex equations. Simulation results indicate that PSO is successful in terms of optimality, but it shows some difficulties for real-time application
    corecore