205 research outputs found
Learning Individual Policies in Large Multi-agent Systems through Local Variance Minimization
In multi-agent systems with large number of agents, typically the
contribution of each agent to the value of other agents is minimal (e.g.,
aggregation systems such as Uber, Deliveroo). In this paper, we consider such
multi-agent systems where each agent is self-interested and takes a sequence of
decisions and represent them as a Stochastic Non-atomic Congestion Game (SNCG).
We derive key properties for equilibrium solutions in SNCG model with
non-atomic and also nearly non-atomic agents. With those key equilibrium
properties, we provide a novel Multi-Agent Reinforcement Learning (MARL)
mechanism that minimizes variance across values of agents in the same state. To
demonstrate the utility of this new mechanism, we provide detailed results on a
real-world taxi dataset and also a generic simulator for aggregation systems.
We show that our approach reduces the variance in revenues earned by taxi
drivers, while still providing higher joint revenues than leading approaches.Comment: arXiv admin note: substantial text overlap with arXiv:2003.0708
Transferable Curricula through Difficulty Conditioned Generators
Advancements in reinforcement learning (RL) have demonstrated superhuman
performance in complex tasks such as Starcraft, Go, Chess etc. However,
knowledge transfer from Artificial "Experts" to humans remain a significant
challenge. A promising avenue for such transfer would be the use of curricula.
Recent methods in curricula generation focuses on training RL agents
efficiently, yet such methods rely on surrogate measures to track student
progress, and are not suited for training robots in the real world (or more
ambitiously humans). In this paper, we introduce a method named Parameterized
Environment Response Model (PERM) that shows promising results in training RL
agents in parameterized environments. Inspired by Item Response Theory, PERM
seeks to model difficulty of environments and ability of RL agents directly.
Given that RL agents and humans are trained more efficiently under the "zone of
proximal development", our method generates a curriculum by matching the
difficulty of an environment to the current ability of the student. In
addition, PERM can be trained offline and does not employ non-stationary
measures of student ability, making it suitable for transfer between students.
We demonstrate PERM's ability to represent the environment parameter space, and
training with RL agents with PERM produces a strong performance in
deterministic environments. Lastly, we show that our method is transferable
between students, without any sacrifice in training quality.Comment: IJCAI'2
Enhancing the Hierarchical Environment Design via Generative Trajectory Modeling
Unsupervised Environment Design (UED) is a paradigm for automatically
generating a curriculum of training environments, enabling agents trained in
these environments to develop general capabilities, i.e., achieving good
zero-shot transfer performance. However, existing UED approaches focus
primarily on the random generation of environments for open-ended agent
training. This is impractical in scenarios with limited resources, such as the
constraints on the number of generated environments. In this paper, we
introduce a hierarchical MDP framework for environment design under resource
constraints. It consists of an upper-level RL teacher agent that generates
suitable training environments for a lower-level student agent. The RL teacher
can leverage previously discovered environment structures and generate
environments at the frontier of the student's capabilities by observing the
student policy's representation. Moreover, to reduce the time-consuming
collection of experiences for the upper-level teacher, we utilize recent
advances in generative modeling to synthesize a trajectory dataset to train the
teacher agent. Our proposed method significantly reduces the resource-intensive
interactions between agents and environments and empirical experiments across
various domains demonstrate the effectiveness of our approach
ZAC: A Zone pAth Construction approach for effective real-time ridesharing
National Research Foundation (NRF) Singapore under SMART and Future Mobilit
Diversity Induced Environment Design via Self-Play
Recent work on designing an appropriate distribution of environments has
shown promise for training effective generally capable agents. Its success is
partly because of a form of adaptive curriculum learning that generates
environment instances (or levels) at the frontier of the agent's capabilities.
However, such an environment design framework often struggles to find effective
levels in challenging design spaces and requires costly interactions with the
environment. In this paper, we aim to introduce diversity in the Unsupervised
Environment Design (UED) framework. Specifically, we propose a task-agnostic
method to identify observed/hidden states that are representative of a given
level. The outcome of this method is then utilized to characterize the
diversity between two levels, which as we show can be crucial to effective
performance. In addition, to improve sampling efficiency, we incorporate the
self-play technique that allows the environment generator to automatically
generate environments that are of great benefit to the training agent.
Quantitatively, our approach, Diversity-induced Environment Design via
Self-Play (DivSP), shows compelling performance over existing methods
- …