68 research outputs found
Deep Decentralized Multi-task Multi-Agent Reinforcement Learning under Partial Observability
Many real-world tasks involve multiple agents with partial observability and
limited communication. Learning is challenging in these settings due to local
viewpoints of agents, which perceive the world as non-stationary due to
concurrently-exploring teammates. Approaches that learn specialized policies
for individual tasks face problems when applied to the real world: not only do
agents have to learn and store distinct policies for each task, but in practice
identities of tasks are often non-observable, making these approaches
inapplicable. This paper formalizes and addresses the problem of multi-task
multi-agent reinforcement learning under partial observability. We introduce a
decentralized single-task learning approach that is robust to concurrent
interactions of teammates, and present an approach for distilling single-task
policies into a unified policy that performs well across multiple related
tasks, without explicit provision of task identity.Comment: Accepted to ICML 201
Individual specialization in multi-task environments with multiagent reinforcement learners
There is a growing interest in Multi-Agent Reinforcement Learning (MARL) as
the first steps towards building general intelligent agents that learn to make
low and high-level decisions in non-stationary complex environments in the
presence of other agents. Previous results point us towards increased
conditions for coordination, efficiency/fairness, and common-pool resource
sharing. We further study coordination in multi-task environments where several
rewarding tasks can be performed and thus agents don't necessarily need to
perform well in all tasks, but under certain conditions may specialize. An
observation derived from the study is that epsilon greedy exploration of
value-based reinforcement learning methods is not adequate for multi-agent
independent learners because the epsilon parameter that controls the
probability of selecting a random action synchronizes the agents artificially
and forces them to have deterministic policies at the same time. By using
policy-based methods with independent entropy regularised exploration updates,
we achieved a better and smoother convergence. Another result that needs to be
further investigated is that with an increased number of agents specialization
tends to be more probable.Comment: 5 pages, 2 figures, paper appeared in CCIA 201
Adaptive Mechanism Design: Learning to Promote Cooperation
In the future, artificial learning agents are likely to become increasingly
widespread in our society. They will interact with both other learning agents
and humans in a variety of complex settings including social dilemmas. We
consider the problem of how an external agent can promote cooperation between
artificial learners by distributing additional rewards and punishments based on
observing the learners' actions. We propose a rule for automatically learning
how to create right incentives by considering the players' anticipated
parameter updates. Using this learning rule leads to cooperation with high
social welfare in matrix games in which the agents would otherwise learn to
defect with high probability. We show that the resulting cooperative outcome is
stable in certain games even if the planning agent is turned off after a given
number of episodes, while other games require ongoing intervention to maintain
mutual cooperation. However, even in the latter case, the amount of necessary
additional incentives decreases over time
Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction
Multiagent reinforcement learning (MARL) is commonly considered to suffer
from non-stationary environments and exponentially increasing policy space. It
would be even more challenging when rewards are sparse and delayed over long
trajectories. In this paper, we study hierarchical deep MARL in cooperative
multiagent problems with sparse and delayed reward. With temporal abstraction,
we decompose the problem into a hierarchy of different time scales and
investigate how agents can learn high-level coordination based on the
independent skills learned at the low level. Three hierarchical deep MARL
architectures are proposed to learn hierarchical policies under different MARL
paradigms. Besides, we propose a new experience replay mechanism to alleviate
the issue of the sparse transitions at the high level of abstraction and the
non-stationarity of multiagent learning. We empirically demonstrate the
effectiveness of our approaches in two domains with extremely sparse feedback:
(1) a variety of Multiagent Trash Collection tasks, and (2) a challenging
online mobile game, i.e., Fever Basketball Defense
Deep Q-Network Based Multi-agent Reinforcement Learning with Binary Action Agents
Deep Q-Network (DQN) based multi-agent systems (MAS) for reinforcement
learning (RL) use various schemes where in the agents have to learn and
communicate. The learning is however specific to each agent and communication
may be satisfactorily designed for the agents. As more complex Deep QNetworks
come to the fore, the overall complexity of the multi-agent system increases
leading to issues like difficulty in training, need for higher resources and
more training time, difficulty in fine-tuning, etc. To address these issues we
propose a simple but efficient DQN based MAS for RL which uses shared state and
rewards, but agent-specific actions, for updation of the experience replay pool
of the DQNs, where each agent is a DQN. The benefits of the approach are
overall simplicity, faster convergence and better performance as compared to
conventional DQN based approaches. It should be noted that the method can be
extended to any DQN. As such we use simple DQN and DDQN (Double Q-learning)
respectively on three separate tasks i.e. Cartpole-v1 (OpenAI Gym environment)
, LunarLander-v2 (OpenAI Gym environment) and Maze Traversal (customized
environment). The proposed approach outperforms the baseline on these tasks by
decent margins respectively
Fairness in Multi-agent Reinforcement Learning for Stock Trading
Unfair stock trading strategies have been shown to be one of the most
negative perceptions that customers can have concerning trading and may result
in long-term losses for a company. Investment banks usually place trading
orders for multiple clients with the same target assets but different order
sizes and diverse requirements such as time frame and risk aversion level,
thereby total earning and individual earning cannot be optimized at the same
time. Orders executed earlier would affect the market price level, so late
execution usually means additional implementation cost. In this paper, we
propose a novel scheme that utilizes multi-agent reinforcement learning systems
to derive stock trading strategies for all clients which keep a balance between
revenue and fairness. First, we demonstrate that Reinforcement learning (RL) is
able to learn from experience and adapt the trading strategies to the complex
market environment. Secondly, we show that the Multi-agent RL system allows
developing trading strategies for all clients individually, thus optimizing
individual revenue. Thirdly, we use the Generalized Gini Index (GGI)
aggregation function to control the fairness level of the revenue across all
clients. Lastly, we empirically demonstrate the superiority of the novel scheme
in improving fairness meanwhile maintaining optimization of revenue.Comment: arXiv admin note: substantial text overlap with arXiv:1906.11046;
text overlap with arXiv:1907.10323 by other author
Efficient Ridesharing Dispatch Using Multi-Agent Reinforcement Learning
With the advent of ride-sharing services, there is a huge increase in the
number of people who rely on them for various needs. Most of the earlier
approaches tackling this issue required handcrafted functions for estimating
travel times and passenger waiting times. Traditional Reinforcement Learning
(RL) based methods attempting to solve the ridesharing problem are unable to
accurately model the complex environment in which taxis operate. Prior
Multi-Agent Deep RL based methods based on Independent DQN (IDQN) learn
decentralized value functions prone to instability due to the concurrent
learning and exploring of multiple agents. Our proposed method based on QMIX is
able to achieve centralized training with decentralized execution. We show that
our model performs better than the IDQN baseline on a fixed grid size and is
able to generalize well to smaller or larger grid sizes. Also, our algorithm is
able to outperform IDQN baseline in the scenario where we have a variable
number of passengers and cars in each episode. Code for our paper is publicly
available at: https://github.com/UMich-ML-Group/RL-Ridesharing
R-MADDPG for Partially Observable Environments and Limited Communication
There are several real-world tasks that would benefit from applying
multiagent reinforcement learning (MARL) algorithms, including the coordination
among self-driving cars. The real world has challenging conditions for
multiagent learning systems, such as its partial observable and nonstationary
nature. Moreover, if agents must share a limited resource (e.g. network
bandwidth) they must all learn how to coordinate resource use. This paper
introduces a deep recurrent multiagent actor-critic framework (R-MADDPG) for
handling multiagent coordination under partial observable set-tings and limited
communication. We investigate recurrency effects on performance and
communication use of a team of agents. We demonstrate that the resulting
framework learns time dependencies for sharing missing observations, handling
resource limitations, and developing different communication patterns among
agents.Comment: Reinforcement Learning for Real Life (RL4RealLife) Workshop in the
36th International Conference on Machine Learning, Long Beach, California,
USA, 201
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
We explore deep reinforcement learning methods for multi-agent domains. We
begin by analyzing the difficulty of traditional algorithms in the multi-agent
case: Q-learning is challenged by an inherent non-stationarity of the
environment, while policy gradient suffers from a variance that increases as
the number of agents grows. We then present an adaptation of actor-critic
methods that considers action policies of other agents and is able to
successfully learn policies that require complex multi-agent coordination.
Additionally, we introduce a training regimen utilizing an ensemble of policies
for each agent that leads to more robust multi-agent policies. We show the
strength of our approach compared to existing methods in cooperative as well as
competitive scenarios, where agent populations are able to discover various
physical and informational coordination strategies
Negative Update Intervals in Deep Multi-Agent Reinforcement Learning
In Multi-Agent Reinforcement Learning (MA-RL), independent cooperative
learners must overcome a number of pathologies to learn optimal joint policies.
Addressing one pathology often leaves approaches vulnerable towards others. For
instance, hysteretic Q-learning addresses miscoordination while leaving agents
vulnerable towards misleading stochastic rewards. Other methods, such as
leniency, have proven more robust when dealing with multiple pathologies
simultaneously. However, leniency has predominately been studied within the
context of strategic form games (bimatrix games) and fully observable Markov
games consisting of a small number of probabilistic state transitions. This
raises the question of whether these findings scale to more complex domains.
For this purpose we implement a temporally extend version of the Climb Game,
within which agents must overcome multiple pathologies simultaneously,
including relative overgeneralisation, stochasticity, the alter-exploration and
moving target problems, while learning from a large observation space. We find
that existing lenient and hysteretic approaches fail to consistently learn near
optimal joint-policies in this environment. To address these pathologies we
introduce Negative Update Intervals-DDQN (NUI-DDQN), a Deep MA-RL algorithm
which discards episodes yielding cumulative rewards outside the range of
expanding intervals. NUI-DDQN consistently gravitates towards optimal
joint-policies in our environment, overcoming the outlined pathologies.Comment: 11 Pages, 6 Figures, AAMAS2019 Conference Proceeding
- …