4,483 research outputs found
Intent-aware Multi-agent Reinforcement Learning
This paper proposes an intent-aware multi-agent planning framework as well as
a learning algorithm. Under this framework, an agent plans in the goal space to
maximize the expected utility. The planning process takes the belief of other
agents' intents into consideration. Instead of formulating the learning problem
as a partially observable Markov decision process (POMDP), we propose a simple
but effective linear function approximation of the utility function. It is
based on the observation that for humans, other people's intents will pose an
influence on our utility for a goal. The proposed framework has several major
advantages: i) it is computationally feasible and guaranteed to converge. ii)
It can easily integrate existing intent prediction and low-level planning
algorithms. iii) It does not suffer from sparse feedbacks in the action space.
We experiment our algorithm in a real-world problem that is non-episodic, and
the number of agents and goals can vary over time. Our algorithm is trained in
a scene in which aerial robots and humans interact, and tested in a novel scene
with a different environment. Experimental results show that our algorithm
achieves the best performance and human-like behaviors emerge during the
dynamic process.Comment: ICRA 201
IntelligentCrowd: Mobile Crowdsensing via Multi-agent Reinforcement Learning
The prosperity of smart mobile devices has made mobile crowdsensing (MCS) a
promising paradigm for completing complex sensing and computation tasks. In the
past, great efforts have been made on the design of incentive mechanisms and
task allocation strategies from MCS platform's perspective to motivate mobile
users' participation. However, in practice, MCS participants face many
uncertainties coming from their sensing environment as well as other
participants' strategies, and how do they interact with each other and make
sensing decisions is not well understood. In this paper, we take MCS
participants' perspective to derive an online sensing policy to maximize their
payoffs via MCS participation. Specifically, we model the interactions of
mobile users and sensing environments as a multi-agent Markov decision process.
Each participant cannot observe others' decisions, but needs to decide her
effort level in sensing tasks only based on local information, e.g., its own
record of sensed signals' quality. To cope with the stochastic sensing
environment, we develop an intelligent crowdsensing algorithm IntelligentCrowd
by leveraging the power of multi-agent reinforcement learning (MARL). Our
algorithm leads to the optimal sensing policy for each user to maximize the
expected payoff against stochastic sensing environments, and can be implemented
at individual participant's level in a distributed fashion. Numerical
simulations demonstrate that IntelligentCrowd significantly improves users'
payoffs in sequential MCS tasks under various sensing dynamics.Comment: In Submissio
Actor-Attention-Critic for Multi-Agent Reinforcement Learning
Reinforcement learning in multi-agent scenarios is important for real-world
applications but presents challenges beyond those seen in single-agent
settings. We present an actor-critic algorithm that trains decentralized
policies in multi-agent settings, using centrally computed critics that share
an attention mechanism which selects relevant information for each agent at
every timestep. This attention mechanism enables more effective and scalable
learning in complex multi-agent environments, when compared to recent
approaches. Our approach is applicable not only to cooperative settings with
shared rewards, but also individualized reward settings, including adversarial
settings, as well as settings that do not provide global states, and it makes
no assumptions about the action spaces of the agents. As such, it is flexible
enough to be applied to most multi-agent learning problems.Comment: ICML 2019 camera ready versio
Optimizing Market Making using Multi-Agent Reinforcement Learning
In this paper, reinforcement learning is applied to the problem of optimizing
market making. A multi-agent reinforcement learning framework is used to
optimally place limit orders that lead to successful trades. The framework
consists of two agents. The macro-agent optimizes on making the decision to
buy, sell, or hold an asset. The micro-agent optimizes on placing limit orders
within the limit order book. For the context of this paper, the proposed
framework is applied and studied on the Bitcoin cryptocurrency market. The goal
of this paper is to show that reinforcement learning is a viable strategy that
can be applied to complex problems (with complex environments) such as market
making.Comment: 10 pages, 12 figure
Learning to Schedule Communication in Multi-agent Reinforcement Learning
Many real-world reinforcement learning tasks require multiple agents to make
sequential decisions under the agents' interaction, where well-coordinated
actions among the agents are crucial to achieve the target goal better at these
tasks. One way to accelerate the coordination effect is to enable multiple
agents to communicate with each other in a distributed manner and behave as a
group. In this paper, we study a practical scenario when (i) the communication
bandwidth is limited and (ii) the agents share the communication medium so that
only a restricted number of agents are able to simultaneously use the medium,
as in the state-of-the-art wireless networking standards. This calls for a
certain form of communication scheduling. In that regard, we propose a
multi-agent deep reinforcement learning framework, called SchedNet, in which
agents learn how to schedule themselves, how to encode the messages, and how to
select actions based on received messages. SchedNet is capable of deciding
which agents should be entitled to broadcasting their (encoded) messages, by
learning the importance of each agent's partially observed information. We
evaluate SchedNet against multiple baselines under two different applications,
namely, cooperative communication and navigation, and predator-prey. Our
experiments show a non-negligible performance gap between SchedNet and other
mechanisms such as the ones without communication and with vanilla scheduling
methods, e.g., round robin, ranging from 32% to 43%.Comment: Accepted in ICLR 201
Modeling Others using Oneself in Multi-Agent Reinforcement Learning
We consider the multi-agent reinforcement learning setting with imperfect
information in which each agent is trying to maximize its own utility. The
reward function depends on the hidden state (or goal) of both agents, so the
agents must infer the other players' hidden goals from their observed behavior
in order to solve the tasks. We propose a new approach for learning in these
domains: Self Other-Modeling (SOM), in which an agent uses its own policy to
predict the other agent's actions and update its belief of their hidden state
in an online manner. We evaluate this approach on three different tasks and
show that the agents are able to learn better policies using their estimate of
the other players' hidden states, in both cooperative and adversarial settings.Comment: 10 pages, 16 figures, submitted to ICML 201
Cooperative Multi-Agent Reinforcement Learning Framework for Scalping Trading
We explore deep Reinforcement Learning(RL) algorithms for scalping trading
and knew that there is no appropriate trading gym and agent examples. Thus we
propose gym and agent like Open AI gym in finance. Not only that, we introduce
new RL framework based on our hybrid algorithm which leverages between
supervised learning and RL algorithm and uses meaningful observations such
order book and settlement data from experience watching scalpers trading. That
is very crucial information for traders behavior to be decided. To feed these
data into our model, we use spatio-temporal convolution layer, called Conv3D
for order book data and temporal CNN, called Conv1D for settlement data. Those
are preprocessed by episode filter we developed. Agent consists of four sub
agents divided to clarify their own goal to make best decision. Also, we
adopted value and policy based algorithm to our framework. With these features,
we could make agent mimic scalpers as much as possible. In many fields, RL
algorithm has already begun to transcend human capabilities in many domains.
This approach could be a starting point to beat human in the financial stock
market, too and be a good reference for anyone who wants to design RL algorithm
in real world domain. Finally, weexperiment our framework and gave you
experiment progress
Negative Update Intervals in Deep Multi-Agent Reinforcement Learning
In Multi-Agent Reinforcement Learning (MA-RL), independent cooperative
learners must overcome a number of pathologies to learn optimal joint policies.
Addressing one pathology often leaves approaches vulnerable towards others. For
instance, hysteretic Q-learning addresses miscoordination while leaving agents
vulnerable towards misleading stochastic rewards. Other methods, such as
leniency, have proven more robust when dealing with multiple pathologies
simultaneously. However, leniency has predominately been studied within the
context of strategic form games (bimatrix games) and fully observable Markov
games consisting of a small number of probabilistic state transitions. This
raises the question of whether these findings scale to more complex domains.
For this purpose we implement a temporally extend version of the Climb Game,
within which agents must overcome multiple pathologies simultaneously,
including relative overgeneralisation, stochasticity, the alter-exploration and
moving target problems, while learning from a large observation space. We find
that existing lenient and hysteretic approaches fail to consistently learn near
optimal joint-policies in this environment. To address these pathologies we
introduce Negative Update Intervals-DDQN (NUI-DDQN), a Deep MA-RL algorithm
which discards episodes yielding cumulative rewards outside the range of
expanding intervals. NUI-DDQN consistently gravitates towards optimal
joint-policies in our environment, overcoming the outlined pathologies.Comment: 11 Pages, 6 Figures, AAMAS2019 Conference Proceeding
Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning
Many real-world problems, such as network packet routing and urban traffic
control, are naturally modeled as multi-agent reinforcement learning (RL)
problems. However, existing multi-agent RL methods typically scale poorly in
the problem size. Therefore, a key challenge is to translate the success of
deep learning on single-agent RL to the multi-agent setting. A major stumbling
block is that independent Q-learning, the most popular multi-agent RL method,
introduces nonstationarity that makes it incompatible with the experience
replay memory on which deep Q-learning relies. This paper proposes two methods
that address this problem: 1) using a multi-agent variant of importance
sampling to naturally decay obsolete data and 2) conditioning each agent's
value function on a fingerprint that disambiguates the age of the data sampled
from the replay memory. Results on a challenging decentralised variant of
StarCraft unit micromanagement confirm that these methods enable the successful
combination of experience replay with multi-agent RL.Comment: Camera-ready version, International Conference of Machine Learning
2017; updated to fix print-breaking imag
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning
When observing the actions of others, humans make inferences about why they
acted as they did, and what this implies about the world; humans also use the
fact that their actions will be interpreted in this manner, allowing them to
act informatively and thereby communicate efficiently with others. Although
learning algorithms have recently achieved superhuman performance in a number
of two-player, zero-sum games, scalable multi-agent reinforcement learning
algorithms that can discover effective strategies and conventions in complex,
partially observable settings have proven elusive. We present the Bayesian
action decoder (BAD), a new multi-agent learning method that uses an
approximate Bayesian update to obtain a public belief that conditions on the
actions taken by all agents in the environment. BAD introduces a new Markov
decision process, the public belief MDP, in which the action space consists of
all deterministic partial policies, and exploits the fact that an agent acting
only on this public belief state can still learn to use its private information
if the action space is augmented to be over all partial policies mapping
private information into environment actions. The Bayesian update is closely
related to the theory of mind reasoning that humans carry out when observing
others' actions. We first validate BAD on a proof-of-principle two-step matrix
game, where it outperforms policy gradient methods; we then evaluate BAD on the
challenging, cooperative partial-information card game Hanabi, where, in the
two-player setting, it surpasses all previously published learning and
hand-coded approaches, establishing a new state of the art
- …