3,923 research outputs found

    Multi-stage stochastic optimization and reinforcement learning for forestry epidemic and covid-19 control planning

    Get PDF
    This dissertation focuses on developing new modeling and solution approaches based on multi-stage stochastic programming and reinforcement learning for tackling biological invasions in forests and human populations. Emerald Ash Borer (EAB) is the nemesis of ash trees. This research introduces a multi-stage stochastic mixed-integer programming model to assist forest agencies in managing emerald ash borer insects throughout the U.S. and maximize the public benets of preserving healthy ash trees. This work is then extended to present the first risk-averse multi-stage stochastic mixed-integer program in the invasive species management literature to account for extreme events. Significant computational achievements are obtained using a scenario dominance decomposition and cutting plane algorithm.The results of this work provide crucial insights and decision strategies for optimal resource allocation among surveillance, treatment, and removal of ash trees, leading to a better and healthier environment for future generations. This dissertation also addresses the computational difficulty of solving one of the most difficult classes of combinatorial optimization problems, the Multi-Dimensional Knapsack Problem (MKP). A novel 2-Dimensional (2D) deep reinforcement learning (DRL) framework is developed to represent and solve combinatorial optimization problems focusing on MKP. The DRL framework trains different agents for making sequential decisions and finding the optimal solution while still satisfying the resource constraints of the problem. To our knowledge, this is the first DRL model of its kind where a 2D environment is formulated, and an element of the DRL solution matrix represents an item of the MKP. Our DRL framework shows that it can solve medium-sized and large-sized instances at least 45 and 10 times faster in CPU solution time, respectively, with a maximum solution gap of 0.28% compared to the solution performance of CPLEX. Applying this methodology, yet another recent epidemic problem is tackled, that of COVID-19. This research investigates a reinforcement learning approach tailored with an agent-based simulation model to simulate the disease growth and optimize decision-making during an epidemic. This framework is validated using the COVID-19 data from the Center for Disease Control and Prevention (CDC). Research results provide important insights into government response to COVID-19 and vaccination strategies

    Effects of Anticipation in Individually Motivated Behaviour on Control and Survival in a Multi-Agent Scenario with Resource Constraints

    Get PDF
    This is an open access article distributed under the Creative Commons Attribution License CC BY 3.0 which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Self-organization and survival are inextricably bound to an agent’s ability to control and anticipate its environment. Here we assess both skills when multiple agents compete for a scarce resource. Drawing on insights from psychology, microsociology and control theory, we examine how different assumptions about the behaviour of an agent’s peers in the anticipation process affect subjective control and survival strategies. To quantify control and drive behaviour, we use the recently developed information-theoretic quantity of empowerment with the principle of empowerment maximization. In two experiments involving extensive simulations, we show that agents develop risk-seeking, risk-averse and mixed strategies, which correspond to greedy, parsimonious and mixed behaviour. Although the principle of empowerment maximization is highly generic, the emerging strategies are consistent with what one would expect from rational individuals with dedicated utility models. Our results support empowerment maximization as a universal drive for guided self-organization in collective agent systemsPeer reviewedFinal Published versio

    Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning

    Full text link
    This work focuses on equilibrium selection in no-conflict multi-agent games, where we specifically study the problem of selecting a Pareto-optimal equilibrium among several existing equilibria. It has been shown that many state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone to converging to Pareto-dominated equilibria due to the uncertainty each agent has about the policy of the other agents during training. To address sub-optimal equilibrium selection, we propose Pareto Actor-Critic (Pareto-AC), which is an actor-critic algorithm that utilises a simple property of no-conflict games (a superset of cooperative games): the Pareto-optimal equilibrium in a no-conflict game maximises the returns of all agents and therefore is the preferred outcome for all agents. We evaluate Pareto-AC in a diverse set of multi-agent games and show that it converges to higher episodic returns compared to seven state-of-the-art MARL algorithms and that it successfully converges to a Pareto-optimal equilibrium in a range of matrix games. Finally, we propose PACDCG, a graph neural network extension of Pareto-AC which is shown to efficiently scale in games with a large number of agents.Comment: 20 pages, 12 figure

    Learning Adaptable Risk-Sensitive Policies to Coordinate in Multi-Agent General-Sum Games

    Full text link
    In general-sum games, the interaction of self-interested learning agents commonly leads to socially worse outcomes, such as defect-defect in the iterated stag hunt (ISH). Previous works address this challenge by sharing rewards or shaping their opponents' learning process, which require too strong assumptions. In this paper, we demonstrate that agents trained to optimize expected returns are more likely to choose a safe action that leads to guaranteed but lower rewards. However, there typically exists a risky action that leads to higher rewards in the long run only if agents cooperate, e.g., cooperate-cooperate in ISH. To overcome this, we propose using action value distribution to characterize the decision's risk and corresponding potential payoffs. Specifically, we present Adaptable Risk-Sensitive Policy (ARSP). ARSP learns the distributions over agent's return and estimates a dynamic risk-seeking bonus to discover risky coordination strategies. Furthermore, to avoid overfitting training opponents, ARSP learns an auxiliary opponent modeling task to infer opponents' types and dynamically alter corresponding strategies during execution. Empirically, agents trained via ARSP can achieve stable coordination during training without accessing opponent's rewards or learning process, and can adapt to non-cooperative opponents during execution. To the best of our knowledge, it is the first method to learn coordination strategies between agents both in iterated prisoner's dilemma (IPD) and iterated stag hunt (ISH) without shaping opponents or rewards, and can adapt to opponents with distinct strategies during execution. Furthermore, we show that ARSP can be scaled to high-dimensional settings.Comment: arXiv admin note: substantial text overlap with arXiv:2205.1585

    Partner Selection for the Emergence of Cooperation in Multi-Agent Systems Using Reinforcement Learning

    Get PDF
    Social dilemmas have been widely studied to explain how humans are able to cooperate in society. Considerable effort has been invested in designing artificial agents for social dilemmas that incorporate explicit agent motivations that are chosen to favor coordinated or cooperative responses. The prevalence of this general approach points towards the importance of achieving an understanding of both an agent's internal design and external environment dynamics that facilitate cooperative behavior. In this paper, we investigate how partner selection can promote cooperative behavior between agents who are trained to maximize a purely selfish objective function. Our experiments reveal that agents trained with this dynamic learn a strategy that retaliates against defectors while promoting cooperation with other agents resulting in a prosocial society.Comment:

    Sim2real and Digital Twins in Autonomous Driving: A Survey

    Full text link
    Safety and cost are two important concerns for the development of autonomous driving technologies. From the academic research to commercial applications of autonomous driving vehicles, sufficient simulation and real world testing are required. In general, a large scale of testing in simulation environment is conducted and then the learned driving knowledge is transferred to the real world, so how to adapt driving knowledge learned in simulation to reality becomes a critical issue. However, the virtual simulation world differs from the real world in many aspects such as lighting, textures, vehicle dynamics, and agents' behaviors, etc., which makes it difficult to bridge the gap between the virtual and real worlds. This gap is commonly referred to as the reality gap (RG). In recent years, researchers have explored various approaches to address the reality gap issue, which can be broadly classified into two categories: transferring knowledge from simulation to reality (sim2real) and learning in digital twins (DTs). In this paper, we consider the solutions through the sim2real and DTs technologies, and review important applications and innovations in the field of autonomous driving. Meanwhile, we show the state-of-the-arts from the views of algorithms, models, and simulators, and elaborate the development process from sim2real to DTs. The presentation also illustrates the far-reaching effects of the development of sim2real and DTs in autonomous driving
    • …
    corecore