309 research outputs found

    Weighted SPSA-based Consensus Algorithm for Distributed Cooperative Target Tracking

    Get PDF
    In this paper, a new algorithm for distributed multi-target tracking in a sensor network is proposed. The main feature of that algorithm, combining the SPSA techniques and iterative averaging ("consensus algorithm"), is the ability to solve distributed optimization problems in presence of signals with fully uncertain distribution; the only assumption is the signal’s boundedness. As an example, we consider the multi-target tracking problem, in which the unknown signals include measurement errors and unpredictable target’s maneuvers; statistical properties of these signals are unknown. A special choice of weights in the algorithm enables its application to targets exhibiting different behaviors. An explicit estimate of the residual’s covariance matrix is obtained, which may be considered as a performance index of the algorithm. Theoretical results are illustrated by numerical simulations

    Optimized state feedback regulation of 3DOF helicopter system via extremum seeking

    Get PDF
    In this paper, an optimized state feedback regulation of a 3 degree of freedom (DOF) helicopter is designed via extremum seeking (ES) technique. Multi-parameter ES is applied to optimize the tracking performance via tuning State Vector Feedback with Integration of the Control Error (SVFBICE). Discrete multivariable version of ES is developed to minimize a cost function that measures the performance of the controller. The cost function is a function of the error between the actual and desired axis positions. The controller parameters are updated online as the optimization takes place. This method significantly decreases the time in obtaining optimal controller parameters. Simulations were conducted for the online optimization under both fixed and varying operating conditions. The results demonstrate the usefulness of using ES for preserving the maximum attainable performance

    Modeling, Control, and Impact Analysis of The Next Generation Transportation System

    Get PDF
    This dissertation aims to develop a systematic tool designated for connected and autonomous vehicles, integrating the simulation of traffic dynamics, traffic control strategies, and impact analysis at the network level. The first part of the dissertation is devoted to the traffic flow modeling of connected vehicles. This task is the foundation step for transportation planning, optimized network design, efficient traffic control strategies, etc, of the next generation transportation system. Chapter 2 proposes a cell-based simulation approach to model the proactive driving behavior of connected vehicles. Firstly, a state variable of connected vehicle is introduced to track the trajectory of connected vehicles. Then the exit flow of cells containing connected vehicles is adjusted to simulate the proactive driving behavior, such that the traffic light is green when the connected vehicle arrives at the signalized intersection. Extensive numerical simulation results consistently show that the presence of connected vehicles contributes significantly to the smoothing of traffic flow and vehicular emission reductions in the network. Chapter 3 proposes an optimal estimation approach to calibrate connected vehicles\u27 car-following behavior in a mixed traffic environment. Particularly, the state-space system dynamics is captured by the simplified car-following model with disturbances, where the trajectory of non-connected vehicles are considered as unknown states and the trajectory of connected vehicles are considered as measurements with errors. Objective of the reformulation is to obtain an optimal estimation of states and model parameters simultaneously. It is shown that the customized state-space model is identifiable with the mild assumption that the disturbance covariance of the state update process is diagonal. Then a modified Expectation-Maximization (EM) algorithm based on Kalman smoother is developed to solve the optimal estimation problem. The second part of the dissertation is on traffic control strategies. This task drives the next generation transportation system to a better performance state in terms of safety, mobility, travel time saving, vehicular emission reduction, etc. Chapter 4 develops a novel reinforcement learning algorithm for the challenging coordinated signal control problem. Traffic signals are modeled as intelligent agents interacting with the stochastic traffic environment. The model is built on the framework of coordinated reinforcement learning. The Junction Tree Algorithm based reinforcement learning is proposed to obtain an exact inference of the best joint actions for all the coordinated intersections. The algorithm is implemented and tested with a network containing 18 signalized intersections from a microscopic traffic simulator. Chapter 5 develops a novel linear programming formulation for autonomous intersection control (LPAIC) accounting for traffic dynamics within a connected vehicle environment. Firstly, a lane based bi-level optimization model is introduced to propagate traffic flows in the network. Then the bi-level optimization model is transformed to the linear programming formulation by relaxing the nonlinear constraints with a set of linear inequalities. One special feature of the LPAIC formulation is that the entries of the constraint matrix has only values in {-1, 0, 1}. Moreover, it is proved that the constraint matrix is totally unimodular, the optimal solution exists and contains only integer values. Further, it shows that traffic flows from different lanes pass through the conflict points of the intersection safely and there are no holding flows in the solution. Three numerical case studies are conducted to demonstrate the properties and effectiveness of the LPAIC formulation to solve autonomous intersection control. The third part of the dissertation moves on to the impact analysis of connected vehicles and autonomous vehicles at the network level. This task assesses the positive and negative impacts of the system and provides guidance on transportation planning, traffic control, transportation budget spending, etc. In this part, the impact of different penetration rates of connected vehicle and autonomous vehicles is revealed on the network efficiency of a transportation system. Chapter 6 sets out to model an efficient and fair transportation system accounting for both departure time choice and route choice of a general multi OD network within a dynamic traffic assignment environment. Firstly, a bi-level optimization formulation is introduced based on the link-based traffic flow model. The upper level of the formulation minimizes the total system travel time, whereas the lower level captures traffic flow propagation and the user equilibrium constraint. Then the bi-level formulation is relaxed to a linear programming formulation that produces a lower bound of an efficient and fair system state. An efficient iterative algorithm is proposed to obtain the exact solution. It is shown that the number of iterations is bounded, and the output traffic flow solution is efficient and fair. Finally, two numerical cases (including a single OD network and a multi-OD network) are conducted to demonstrate the performance of the algorithm. The results consistently show that the travel time of different departure rates of the same OD pair are identical and the algorithm converges within two iterations across all test scenarios

    Many-agent Reinforcement Learning

    Get PDF
    Multi-agent reinforcement learning (RL) solves the problem of how each agent should behave optimally in a stochastic environment in which multiple agents are learning simultaneously. It is an interdisciplinary domain with a long history that lies in the joint area of psychology, control theory, game theory, reinforcement learning, and deep learning. Following the remarkable success of the AlphaGO series in single-agent RL, 2019 was a booming year that witnessed significant advances in multi-agent RL techniques; impressive breakthroughs have been made on developing AIs that outperform humans on many challenging tasks, especially multi-player video games. Nonetheless, one of the key challenges of multi-agent RL techniques is the scalability; it is still non-trivial to design efficient learning algorithms that can solve tasks including far more than two agents (N2N \gg 2), which I name by \emph{many-agent reinforcement learning} (MARL\footnote{I use the world of ``MARL" to denote multi-agent reinforcement learning with a particular focus on the cases of many agents; otherwise, it is denoted as ``Multi-Agent RL" by default.}) problems. In this thesis, I contribute to tackling MARL problems from four aspects. Firstly, I offer a self-contained overview of multi-agent RL techniques from a game-theoretical perspective. This overview fills the research gap that most of the existing work either fails to cover the recent advances since 2010 or does not pay adequate attention to game theory, which I believe is the cornerstone to solving many-agent learning problems. Secondly, I develop a tractable policy evaluation algorithm -- αα\alpha^\alpha-Rank -- in many-agent systems. The critical advantage of αα\alpha^\alpha-Rank is that it can compute the solution concept of α\alpha-Rank tractably in multi-player general-sum games with no need to store the entire pay-off matrix. This is in contrast to classic solution concepts such as Nash equilibrium which is known to be PPADPPAD-hard in even two-player cases. αα\alpha^\alpha-Rank allows us, for the first time, to practically conduct large-scale multi-agent evaluations. Thirdly, I introduce a scalable policy learning algorithm -- mean-field MARL -- in many-agent systems. The mean-field MARL method takes advantage of the mean-field approximation from physics, and it is the first provably convergent algorithm that tries to break the curse of dimensionality for MARL tasks. With the proposed algorithm, I report the first result of solving the Ising model and multi-agent battle games through a MARL approach. Fourthly, I investigate the many-agent learning problem in open-ended meta-games (i.e., the game of a game in the policy space). Specifically, I focus on modelling the behavioural diversity in meta-games, and developing algorithms that guarantee to enlarge diversity during training. The proposed metric based on determinantal point processes serves as the first mathematically rigorous definition for diversity. Importantly, the diversity-aware learning algorithms beat the existing state-of-the-art game solvers in terms of exploitability by a large margin. On top of the algorithmic developments, I also contribute two real-world applications of MARL techniques. Specifically, I demonstrate the great potential of applying MARL to study the emergent population dynamics in nature, and model diverse and realistic interactions in autonomous driving. Both applications embody the prospect that MARL techniques could achieve huge impacts in the real physical world, outside of purely video games

    Reinforcement learning applied to the real world : uncertainty, sample efficiency, and multi-agent coordination

    Full text link
    L'immense potentiel des approches d'apprentissage par renforcement profond (ARP) pour la conception d'agents autonomes a été démontré à plusieurs reprises au cours de la dernière décennie. Son application à des agents physiques, tels que des robots ou des réseaux électriques automatisés, est cependant confrontée à plusieurs défis. Parmi eux, l'inefficacité de leur échantillonnage, combinée au coût et au risque d'acquérir de l'expérience dans le monde réel, peut décourager tout projet d'entraînement d'agents incarnés. Dans cette thèse, je me concentre sur l'application de l'ARP sur des agents physiques. Je propose d'abord un cadre probabiliste pour améliorer l'efficacité de l'échantillonnage dans l'ARP. Dans un premier article, je présente la pondération BIV (batch inverse-variance), une fonction de perte tenant compte de la variance du bruit des étiquettes dans la régression bruitée hétéroscédastique. La pondération BIV est un élément clé du deuxième article, où elle est combinée avec des méthodes de pointe de prédiction de l'incertitude pour les réseaux neuronaux profonds dans un pipeline bayésien pour les algorithmes d'ARP avec différences temporelles. Cette approche, nommée apprentissage par renforcement à variance inverse (IV-RL), conduit à un entraînement nettement plus rapide ainsi qu'à de meilleures performances dans les tâches de contrôle. Dans le troisième article, l'apprentissage par renforcement multi-agent (MARL) est appliqué au problème de la réponse rapide à la demande, une approche prometteuse pour gérer l'introduction de sources d'énergie renouvelables intermittentes dans les réseaux électriques. En contrôlant la coordination de plusieurs climatiseurs, les agents MARL obtiennent des performances nettement supérieures à celles des approches basées sur des règles. Ces résultats soulignent le rôle potentiel que les agents physiques entraînés par MARL pourraient jouer dans la transition énergétique et la lutte contre le réchauffement climatique.The immense potential of deep reinforcement learning (DRL) approaches to build autonomous agents has been proven repeatedly in the last decade. Its application to embodied agents, such as robots or automated power systems, is however facing several challenges. Among them, their sample inefficiency, combined to the cost and the risk of gathering experience in the real world, can deter any idea of training embodied agents. In this thesis, I focus on the application of DRL on embodied agents. I first propose a probabilistic framework to improve sample efficiency in DRL. In the first article, I present batch inverse-variance (BIV) weighting, a loss function accounting for label noise variance in heteroscedastic noisy regression. BIV is a key element of the second article, where it is combined with state-of-the-art uncertainty prediction methods for deep neural networks in a Bayesian pipeline for temporal differences DRL algorithms. This approach, named inverse-variance reinforcement learning (IV-RL), leads to significantly faster training as well as better performance in control tasks. In the third article, multi-agent reinforcement learning (MARL) is applied to the problem of fast-timescale demand response, a promising approach to the manage the introduction of intermittent renewable energy sources in power-grids. As MARL agents control the coordination of multiple air conditioners, they achieve significantly better performance than rule-based approaches. These results underline to the potential role that DRL trained embodied agents could take in the energetic transition and the fight against global warming

    Multi-Agent Systems

    Get PDF
    A multi-agent system (MAS) is a system composed of multiple interacting intelligent agents. Multi-agent systems can be used to solve problems which are difficult or impossible for an individual agent or monolithic system to solve. Agent systems are open and extensible systems that allow for the deployment of autonomous and proactive software components. Multi-agent systems have been brought up and used in several application domains

    Adapt-to-learn policy transfer in reinforcement learning and deep model reference adaptive control

    Get PDF
    Adaptation and Learning from exploration have been a key in biological learning; Humans and animals do not learn every task in isolation; rather are able to quickly adapt the learned behaviors between similar tasks and learn new skills when presented with new situations. Inspired by this, adaptation has been an important direction of research in control as Adaptive Controllers. However, the Adaptive Controllers like Model Reference Adaptive Controller are mainly model-based controllers and do not rely on exploration instead make informed decisions exploiting the model's structure. Therefore such controllers are characterized by high sample efficiency and stability conditions and, therefore, suitable for safety-critical systems. On the other hand, we have Learning-based optimal control algorithms like Reinforcement Learning. Reinforcement learning is a trial and error method, where an agent explores the environment by taking random action and maximizing the likelihood of those particular actions that result in a higher return. However, these exploration techniques are expected to fail many times before exploring optimal policy. Therefore, they are highly sample-expensive and lack stability guarantees and hence not suitable for safety-critical systems. This thesis presents control algorithms for robotics where the best of both worlds that is ``Adaptation'' and ``Learning from exploration'' are brought together to propose new algorithms that can perform better than their conventional counterparts. In this effort, we first present an Adapt to learn policy transfer Algorithm, where we use control theoretical ideas of adaptation to transfer policy between two related but different tasks using the policy gradient method of reinforcement learning. Efficient and robust policy transfer remains a key challenge in reinforcement learning. Policy transfer through warm initialization, imitation, or interacting over a large set of agents with randomized instances, have been commonly applied to solve a variety of Reinforcement Learning (RL) tasks. However, this is far from how behavior transfer happens in the biological world: Here, we seek to answer the question: Will learning to combine adaptation reward with environmental reward lead to a more efficient transfer of policies between domains? We introduce a principled mechanism that can ``Adapt-to-Learn", which is adapt the source policy to learn to solve a target task with significant transition differences and uncertainties. Through theory and experiments, we show that our method leads to a significantly reduced sample complexity of transferring the policies between the tasks. In the second part of this thesis, information-enabled learning-based adaptive controllers like ``Gaussian Process adaptive controller using Model Reference Generative Network'' (GP-MRGeN), ``Deep Model Reference Adaptive Controller'' (DMRAC) are presented. Model reference adaptive control (MRAC) is a widely studied adaptive control methodology that aims to ensure that a nonlinear plant with significant model uncertainty behaves like a chosen reference model. MRAC methods try to adapt the system to changes by representing the system uncertainties as weighted combinations of known nonlinear functions and using weight update law that ensures that network weights are moved in the direction of minimizing the instantaneous tracking error. However, most MRAC adaptive controllers use a shallow network and only the instantaneous data for adaptation, restricting their representation capability and limiting their performance under fast-changing uncertainties and faults in the system. In this thesis, we propose a Gaussian process based adaptive controller called GP-MRGeN. We present a new approach to the online supervised training of GP models using a new architecture termed as Model Reference Generative Network (MRGeN). Our architecture is very loosely inspired by the recent success of generative neural network models. Nevertheless, our contributions ensure that the inclusion of such a model in closed-loop control does not affect the stability properties. The GP-MRGeN controller, through using a generative network, is capable of achieving higher adaptation rates without losing robustness properties of the controller, hence suitable for mitigating faults in fast-evolving systems. Further, in this thesis, we present a new neuroadaptive architecture: Deep Neural Network-based Model Reference Adaptive Control. This architecture utilizes deep neural network representations for modeling significant nonlinearities while marrying it with the boundedness guarantees that characterize MRAC based controllers. We demonstrate through simulations and analysis that DMRAC can subsume previously studied learning-based MRAC methods, such as concurrent learning and GP-MRAC. This makes DMRAC a highly powerful architecture for high-performance control of nonlinear systems with long-term learning properties. Theoretical proofs of the controller generalizing capability over unseen data points and boundedness properties of the tracking error are also presented. Experiments with the quadrotor vehicle demonstrate the controller performance in achieving reference model tracking in the presence of significant matched uncertainties. A software+communication architecture is designed to ensure online real-time inference of the deep network on a high-bandwidth computation-limited platform to achieve these results. These results demonstrate the efficacy of deep networks for high bandwidth closed-loop attitude control of unstable and nonlinear robots operating in adverse situations. We expect that this work will benefit other closed-loop deep-learning control architectures for robotics
    corecore