2 research outputs found
Resource Abstraction for Reinforcement Learning in Multiagent Congestion Problems
Real-world congestion problems (e.g. traffic congestion) are typically very
complex and large-scale. Multiagent reinforcement learning (MARL) is a
promising candidate for dealing with this emerging complexity by providing an
autonomous and distributed solution to these problems. However, there are three
limiting factors that affect the deployability of MARL approaches to congestion
problems. These are learning time, scalability and decentralised coordination
i.e. no communication between the learning agents. In this paper we introduce
Resource Abstraction, an approach that addresses these challenges by allocating
the available resources into abstract groups. This abstraction creates new
reward functions that provide a more informative signal to the learning agents
and aid the coordination amongst them. Experimental work is conducted on two
benchmark domains from the literature, an abstract congestion problem and a
realistic traffic congestion problem. The current state-of-the-art for solving
multiagent congestion problems is a form of reward shaping called difference
rewards. We show that the system using Resource Abstraction significantly
improves the learning speed and scalability, and achieves the highest possible
or near-highest joint performance/social welfare for both congestion problems
in large-scale scenarios involving up to 1000 reinforcement learning agents.Comment: Keywords: congestion problems, resource management, multiagent
reinforcement learning, multiagent systems, multiagent learning, resource
abstraction. In Proceedings of the 2016 International Conference on
Autonomous Agents and Multiagent Systems (AAMAS '16
Adversarial Deep Reinforcement Learning based Adaptive Moving Target Defense
Moving target defense (MTD) is a proactive defense approach that aims to
thwart attacks by continuously changing the attack surface of a system (e.g.,
changing host or network configurations), thereby increasing the adversary's
uncertainty and attack cost. To maximize the impact of MTD, a defender must
strategically choose when and what changes to make, taking into account both
the characteristics of its system as well as the adversary's observed
activities. Finding an optimal strategy for MTD presents a significant
challenge, especially when facing a resourceful and determined adversary who
may respond to the defender's actions. In this paper, we propose a multi-agent
partially-observable Markov Decision Process model of MTD and formulate a
two-player general-sum game between the adversary and the defender. Based on an
established model of adaptive MTD, we propose a multi-agent reinforcement
learning framework based on the double oracle algorithm to solve the game. In
the experiments, we show the effectiveness of our framework in finding optimal
policies