87,562 research outputs found

    Utilising Assured Multi-Agent Reinforcement Learning within safety-critical scenarios

    Get PDF
    Multi-agent reinforcement learning allows a team of agents to learn how to work together to solve complex decision-making problems in a shared environment. However, this learning process utilises stochastic mechanisms, meaning that its use in safety-critical domains can be problematic. To overcome this issue, we propose an Assured Multi-Agent Reinforcement Learning (AMARL) approach that uses a model checking technique called quantitative verification to provide formal guarantees of agent compliance with safety, performance, and other non-functional requirements during and after the reinforcement learning process. We demonstrate the applicability of our AMARL approach in three different patrolling navigation domains in which multi-agent systems must learn to visit key areas by using different types of reinforcement learning algorithms (temporal difference learning, game theory, and direct policy search). Furthermore, we compare the effectiveness of these algorithms when used in combination with and without our approach. Our extensive experiments with both homogeneous and heterogeneous multi-agent systems of different sizes show that the use of AMARL leads to safety requirements being consistently satisfied and to better overall results than standard reinforcement learning

    Quantum inspired algorithms for learning and control of stochastic systems

    Get PDF
    Motivated by the limitations of the current reinforcement learning and optimal control techniques, this dissertation proposes quantum theory inspired algorithms for learning and control of both single-agent and multi-agent stochastic systems. A common problem encountered in traditional reinforcement learning techniques is the exploration-exploitation trade-off. To address the above issue an action selection procedure inspired by a quantum search algorithm called Grover\u27s iteration is developed. This procedure does not require an explicit design parameter to specify the relative frequency of explorative/exploitative actions. The second part of this dissertation extends the powerful adaptive critic design methodology to solve finite horizon stochastic optimal control problems. To numerically solve the stochastic Hamilton Jacobi Bellman equation, which characterizes the optimal expected cost function, large number of trajectory samples are required. The proposed methodology overcomes the above difficulty by using the path integral control formulation to adaptively sample trajectories of importance. The third part of this dissertation presents two quantum inspired coordination models to dynamically assign targets to agents operating in a stochastic environment. The first approach uses a quantum decision theory model that explains irrational action choices in human decision making. The second approach uses a quantum game theory model that exploits the quantum mechanical phenomena \u27entanglement\u27 to increase individual pay-off in multi-player games. The efficiency and scalability of the proposed coordination models are demonstrated through simulations of a large scale multi-agent system --Abstract, page iii

    Adaptation, coordination, and local interactions via distributed approachability

    Get PDF
    This paper investigates the relation between cooperation, competition, and local interactions in large distributed multi-agent systems. The main contribution is the game-theoretic problem formulation and solution approach based on the new framework of distributed approachability, and the study of the convergence properties of the resulting game model. Approachability theory is the theory of two-player repeated games with vector payoffs, and distributed approachability is here presented for the first time as an extension to the case where we have a team of agents cooperating against a team of adversaries under local information and interaction structure. The game model turns into a nonlinear differential inclusion, which after a proper design of the control and disturbance policies, presents a consensus term and an exogenous adversarial input. Local interactions enter in the model through a graph topology and the corresponding graph-Laplacian matrix. Given the above model, we turn the original questions on cooperation, competition, and local interactions, into convergence properties of the differential inclusion. In particular, we prove convergence and exponential convergence conditions around zero under general Markovian strategies. We illustrate our results in the case of decentralized organizations with multiple decision-makers
    • …
    corecore