59 research outputs found

    Safe Sequential Path Planning Under Disturbances and Imperfect Information

    Full text link
    Multi-UAV systems are safety-critical, and guarantees must be made to ensure no unsafe configurations occur. Hamilton-Jacobi (HJ) reachability is ideal for analyzing such safety-critical systems; however, its direct application is limited to small-scale systems of no more than two vehicles due to an exponentially-scaling computational complexity. Previously, the sequential path planning (SPP) method, which assigns strict priorities to vehicles, was proposed; SPP allows multi-vehicle path planning to be done with a linearly-scaling computational complexity. However, the previous formulation assumed that there are no disturbances, and that every vehicle has perfect knowledge of higher-priority vehicles' positions. In this paper, we make SPP more practical by providing three different methods to account for disturbances in dynamics and imperfect knowledge of higher-priority vehicles' states. Each method has different assumptions about information sharing. We demonstrate our proposed methods in simulations.Comment: American Control Conference, 201

    Design and Implementation of Distributed Resource Management for Time Sensitive Applications

    Full text link
    In this paper, we address distributed convergence to fair allocations of CPU resources for time-sensitive applications. We propose a novel resource management framework where a centralized objective for fair allocations is decomposed into a pair of performance-driven recursive processes for updating: (a) the allocation of computing bandwidth to the applications (resource adaptation), executed by the resource manager, and (b) the service level of each application (service-level adaptation), executed by each application independently. We provide conditions under which the distributed recursive scheme exhibits convergence to solutions of the centralized objective (i.e., fair allocations). Contrary to prior work on centralized optimization schemes, the proposed framework exhibits adaptivity and robustness to changes both in the number and nature of applications, while it assumes minimum information available to both applications and the resource manager. We finally validate our framework with simulations using the TrueTime toolbox in MATLAB/Simulink

    Distributed dynamic reinforcement of efficient outcomes in multiagent coordination and network formation

    Get PDF
    We analyze reinforcement learning under so-called “dynamic reinforcement”. In reinforcement learning, each agentrepeatedly interacts with an unknown environment (i.e., other agents), receives a reward, and updates the probabilities of its next action based on its own previous actions and received rewards. Unlike standard reinforcement learning, dynamic reinforcement uses a combination of long term rewards and recent rewards to construct myopically forward looking action selection probabilities. We analyze the long term stability of the learning dynamics for general games with pure strategy Nash equilibria and specialize the results for coordination games and distributed network formation. In this class of problems, more than one stable equilibrium (i.e., coordination configuration) may exist. We demonstrate equilibrium selection under dynamic reinforcement. In particular, we show how a single agent is able to destabilize an equilibrium in favor of another by appropriately adjusting its dynamic reinforcement parameters. We contrast the conclusions with prior game theoretic results according to which the risk dominant equilibrium is the only robust equilibrium when agents ’ decisions are subject to small randomized perturbations. The analysis throughout is based on the ODE method for stochastic approximations, where a special form of perturbation in the learning dynamics allows for analyzing its behavior at the boundary points of the state space

    Straked force database extraction & transient response analysis

    Get PDF
    Thesis (S.M. in Mechanical Engineering and Naval Architecture and Marine Engineering)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 2009.Includes bibliographical references (p. 55-58).In part I of the thesis, we extend a methodology to extract a VIV hydrodynamic database from field data to accommodate partially straked cylinders. There are two databases, each consisting of the lift and added mass coefficients as functions of reduced velocity and amplitude of response; the first for the bare part of the riser, and the second for the straked part. First, the program VIVA together with nominal force databases obtained from laboratory hydrodynamic experiments is used in order to get an initial prediction of the riser response under a particular flow profile. The nominal databases are then altered in a systematic way until the new VIVA predicted response best matches the measured field response, thus resulting in optimal databases. In part II of the thesis, we show using experimental data on a model riser that lock-in of long flexible risers placed in sheared or uniform cross-flows is a much richer phenomenon than lock-in of flexibly-mounted rigid cylinders under similar conditions. In particular, we find that the frequency content of the riser response may be either narrow-banded around a single dominant frequency (Type I response) or distributed along a relatively broad range of frequencies (Type II response). Distinct transition from Type I to Type II response, and vice versa, can occur several times within a single experimental record. Type I responses reveal features of a quasi-periodic oscillation, often accompanied by large 3rd harmonic components in the acceleration and strain signals, increased correlation length, stable riser trajectories, and monochromatic traveling or standing waves.(cont.) Type II responses, on the other hand, are characterized by features of chaotic oscillation with small or non-existent 3rd harmonic components in the acceleration and strain signals, reduced correlation length, and a continuous spectrum.by Filippos Chasparis.S.M.in Mechanical Engineering and Naval Architecture and Marine Engineerin

    Perturbed Learning Automata in Potential Games

    Get PDF
    This paper presents a reinforcement learning algorithm and provides conditions for global convergence to Nash equilibria. For several reinforcement learning schemes, including the ones proposed here, excluding convergence to action profiles which are not Nash equilibria may not be trivial, unless the step-size sequence is appropriately tailored to the specifics of the game. In this paper, we sidestep these issues by introducing a new class of reinforcement learning schemes where the strategy of each agent is perturbed by a state-dependent perturbation function. Contrary to prior work on equilibrium selection in games, where perturbation functions are globally state dependent, the perturbation function here is assumed to be local, i.e., it only depends on the strategy of each agent. We provide conditions under which the strategies of the agents will converge to an arbitrarily small neighborhood of the set of Nash equilibria almost surely. We further specialize the results to a class of potential games

    Deep Residual Policy Reinforcement Learning as a Corrective Term in Process Control for Alarm Reduction: A Preliminary Report

    Get PDF
    Conventional process controllers (such as proportional integral derivative controllers and model predictive controllers) are simple and effective once they have been calibrated for a given system. However, it is difficult and costly to re-tune these controllers if the system deviates from its normal conditions and starts to deteriorate. Recently, reinforcement learning has shown a significant improvement in learning process control policies through direct interaction with a system, without the need of a process model or the system characteristics, as it learns the optimal control by interacting with the environment directly. However, developing such a black-box system is a challenge when the system is complex and it may not be possible to capture the complete dynamics of the system with just a single reinforcement learning agent. Therefore, in this paper, we propose a simple architecture that does not replace the conventional proportional integral derivative controllers but instead augments the control input to the system with a reinforcement learning agent. The agent adds a correction factor to the output provided by the conventional controller to maintain optimal process control even when the system is not operating under its normal condition
    • …
    corecore