3,923 research outputs found
Multi-stage stochastic optimization and reinforcement learning for forestry epidemic and covid-19 control planning
This dissertation focuses on developing new modeling and solution approaches based on multi-stage stochastic programming and reinforcement learning for tackling biological invasions in forests and human populations. Emerald Ash Borer (EAB) is the nemesis of ash trees. This research introduces a multi-stage stochastic mixed-integer programming model to assist forest agencies in managing emerald ash borer insects throughout the U.S. and maximize the public benets of preserving healthy ash trees. This work is then extended to present the first risk-averse multi-stage stochastic mixed-integer program in the invasive species management literature to account for extreme events. Significant computational achievements are obtained using a scenario dominance decomposition and cutting plane algorithm.The results of this work provide crucial insights and decision strategies for optimal resource allocation among surveillance, treatment, and removal of ash trees, leading to a better and healthier environment for future generations.
This dissertation also addresses the computational difficulty of solving one of the most difficult classes of combinatorial optimization problems, the Multi-Dimensional Knapsack Problem (MKP). A novel 2-Dimensional (2D) deep reinforcement learning (DRL) framework is developed to represent and solve combinatorial optimization problems focusing on MKP. The DRL framework trains different agents for making sequential decisions and finding the optimal solution while still satisfying the resource constraints of the problem. To our knowledge, this is the first DRL model of its kind where a 2D environment is formulated, and an element of the DRL solution matrix represents an item of the MKP. Our DRL framework shows that it can solve medium-sized and large-sized instances at least 45 and 10 times faster in CPU solution time, respectively, with a maximum solution gap of 0.28% compared to the solution performance of CPLEX. Applying this methodology, yet another recent epidemic problem is tackled, that of COVID-19. This research investigates a reinforcement learning approach tailored with an agent-based simulation model to simulate the disease growth and optimize decision-making during an epidemic. This framework is validated using the COVID-19 data from the Center for Disease Control and Prevention (CDC). Research results provide important insights into government response to COVID-19 and vaccination strategies
Effects of Anticipation in Individually Motivated Behaviour on Control and Survival in a Multi-Agent Scenario with Resource Constraints
This is an open access article distributed under the Creative Commons Attribution License CC BY 3.0 which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Self-organization and survival are inextricably bound to an agent’s ability to control and anticipate its environment. Here we assess both skills when multiple agents compete for a scarce resource. Drawing on insights from psychology, microsociology and control theory, we examine how different assumptions about the behaviour of an agent’s peers in the anticipation process affect subjective control and survival strategies. To quantify control and drive behaviour, we use the recently developed information-theoretic quantity of empowerment with the principle of empowerment maximization. In two experiments involving extensive simulations, we show that agents develop risk-seeking, risk-averse and mixed strategies, which correspond to greedy, parsimonious and mixed behaviour. Although the principle of empowerment maximization is highly generic, the emerging strategies are consistent with what one would expect from rational individuals with dedicated utility models. Our results support empowerment maximization as a universal drive for guided self-organization in collective agent systemsPeer reviewedFinal Published versio
Pareto Actor-Critic for Equilibrium Selection in Multi-Agent Reinforcement Learning
This work focuses on equilibrium selection in no-conflict multi-agent games,
where we specifically study the problem of selecting a Pareto-optimal
equilibrium among several existing equilibria. It has been shown that many
state-of-the-art multi-agent reinforcement learning (MARL) algorithms are prone
to converging to Pareto-dominated equilibria due to the uncertainty each agent
has about the policy of the other agents during training. To address
sub-optimal equilibrium selection, we propose Pareto Actor-Critic (Pareto-AC),
which is an actor-critic algorithm that utilises a simple property of
no-conflict games (a superset of cooperative games): the Pareto-optimal
equilibrium in a no-conflict game maximises the returns of all agents and
therefore is the preferred outcome for all agents. We evaluate Pareto-AC in a
diverse set of multi-agent games and show that it converges to higher episodic
returns compared to seven state-of-the-art MARL algorithms and that it
successfully converges to a Pareto-optimal equilibrium in a range of matrix
games. Finally, we propose PACDCG, a graph neural network extension of
Pareto-AC which is shown to efficiently scale in games with a large number of
agents.Comment: 20 pages, 12 figure
Learning Adaptable Risk-Sensitive Policies to Coordinate in Multi-Agent General-Sum Games
In general-sum games, the interaction of self-interested learning agents
commonly leads to socially worse outcomes, such as defect-defect in the
iterated stag hunt (ISH). Previous works address this challenge by sharing
rewards or shaping their opponents' learning process, which require too strong
assumptions. In this paper, we demonstrate that agents trained to optimize
expected returns are more likely to choose a safe action that leads to
guaranteed but lower rewards. However, there typically exists a risky action
that leads to higher rewards in the long run only if agents cooperate, e.g.,
cooperate-cooperate in ISH. To overcome this, we propose using action value
distribution to characterize the decision's risk and corresponding potential
payoffs. Specifically, we present Adaptable Risk-Sensitive Policy (ARSP). ARSP
learns the distributions over agent's return and estimates a dynamic
risk-seeking bonus to discover risky coordination strategies. Furthermore, to
avoid overfitting training opponents, ARSP learns an auxiliary opponent
modeling task to infer opponents' types and dynamically alter corresponding
strategies during execution. Empirically, agents trained via ARSP can achieve
stable coordination during training without accessing opponent's rewards or
learning process, and can adapt to non-cooperative opponents during execution.
To the best of our knowledge, it is the first method to learn coordination
strategies between agents both in iterated prisoner's dilemma (IPD) and
iterated stag hunt (ISH) without shaping opponents or rewards, and can adapt to
opponents with distinct strategies during execution. Furthermore, we show that
ARSP can be scaled to high-dimensional settings.Comment: arXiv admin note: substantial text overlap with arXiv:2205.1585
Partner Selection for the Emergence of Cooperation in Multi-Agent Systems Using Reinforcement Learning
Social dilemmas have been widely studied to explain how humans are able to
cooperate in society. Considerable effort has been invested in designing
artificial agents for social dilemmas that incorporate explicit agent
motivations that are chosen to favor coordinated or cooperative responses. The
prevalence of this general approach points towards the importance of achieving
an understanding of both an agent's internal design and external environment
dynamics that facilitate cooperative behavior. In this paper, we investigate
how partner selection can promote cooperative behavior between agents who are
trained to maximize a purely selfish objective function. Our experiments reveal
that agents trained with this dynamic learn a strategy that retaliates against
defectors while promoting cooperation with other agents resulting in a
prosocial society.Comment:
Sim2real and Digital Twins in Autonomous Driving: A Survey
Safety and cost are two important concerns for the development of autonomous
driving technologies. From the academic research to commercial applications of
autonomous driving vehicles, sufficient simulation and real world testing are
required. In general, a large scale of testing in simulation environment is
conducted and then the learned driving knowledge is transferred to the real
world, so how to adapt driving knowledge learned in simulation to reality
becomes a critical issue. However, the virtual simulation world differs from
the real world in many aspects such as lighting, textures, vehicle dynamics,
and agents' behaviors, etc., which makes it difficult to bridge the gap between
the virtual and real worlds. This gap is commonly referred to as the reality
gap (RG). In recent years, researchers have explored various approaches to
address the reality gap issue, which can be broadly classified into two
categories: transferring knowledge from simulation to reality (sim2real) and
learning in digital twins (DTs). In this paper, we consider the solutions
through the sim2real and DTs technologies, and review important applications
and innovations in the field of autonomous driving. Meanwhile, we show the
state-of-the-arts from the views of algorithms, models, and simulators, and
elaborate the development process from sim2real to DTs. The presentation also
illustrates the far-reaching effects of the development of sim2real and DTs in
autonomous driving
- …