10,172 research outputs found
Utility Design for Distributed Resource Allocation -- Part I: Characterizing and Optimizing the Exact Price of Anarchy
Game theory has emerged as a fruitful paradigm for the design of networked
multiagent systems. A fundamental component of this approach is the design of
agents' utility functions so that their self-interested maximization results in
a desirable collective behavior. In this work we focus on a well-studied class
of distributed resource allocation problems where each agent is requested to
select a subset of resources with the goal of optimizing a given system-level
objective. Our core contribution is the development of a novel framework to
tightly characterize the worst case performance of any resulting Nash
equilibrium (price of anarchy) as a function of the chosen agents' utility
functions. Leveraging this result, we identify how to design such utilities so
as to optimize the price of anarchy through a tractable linear program. This
provides us with a priori performance certificates applicable to any existing
learning algorithm capable of driving the system to an equilibrium. Part II of
this work specializes these results to submodular and supermodular objectives,
discusses the complexity of computing Nash equilibria, and provides multiple
illustrations of the theoretical findings.Comment: 15 pages, 5 figure
Optimizing collective fieldtaxis of swarming agents through reinforcement learning
Swarming of animal groups enthralls scientists in fields ranging from biology
to physics to engineering. Complex swarming patterns often arise from simple
interactions between individuals to the benefit of the collective whole. The
existence and success of swarming, however, nontrivially depend on microscopic
parameters governing the interactions. Here we show that a machine-learning
technique can be employed to tune these underlying parameters and optimize the
resulting performance. As a concrete example, we take an active matter model
inspired by schools of golden shiners, which collectively conduct phototaxis.
The problem of optimizing the phototaxis capability is then mapped to that of
maximizing benefits in a continuum-armed bandit game. The latter problem
accepts a simple reinforcement-learning algorithm, which can tune the
continuous parameters of the model. This result suggests the utility of
machine-learning methodology in swarm-robotics applications.Comment: 6 pages, 3 figure
Optimizing evacuation flow in a two-channel exclusion process
We use a basic setup of two coupled exclusion processes to model a stylised
situation in evacuation dynamics, in which evacuees have to choose between two
escape routes. The coupling between the two processes occurs through one common
point at which particles are injected, the process can be controlled by
directing incoming individuals into either of the two escape routes. Based on a
mean-field approach we determine the phase behaviour of the model, and
analytically compute optimal control strategies, maximising the total current
through the system. Results are confirmed by numerical simulations. We also
show that dynamic intervention, exploiting fluctuations about the mean-field
stationary state, can lead to a further increase in total current.Comment: 16 pages, 6 figure
An Adversarial Interpretation of Information-Theoretic Bounded Rationality
Recently, there has been a growing interest in modeling planning with
information constraints. Accordingly, an agent maximizes a regularized expected
utility known as the free energy, where the regularizer is given by the
information divergence from a prior to a posterior policy. While this approach
can be justified in various ways, including from statistical mechanics and
information theory, it is still unclear how it relates to decision-making
against adversarial environments. This connection has previously been suggested
in work relating the free energy to risk-sensitive control and to extensive
form games. Here, we show that a single-agent free energy optimization is
equivalent to a game between the agent and an imaginary adversary. The
adversary can, by paying an exponential penalty, generate costs that diminish
the decision maker's payoffs. It turns out that the optimal strategy of the
adversary consists in choosing costs so as to render the decision maker
indifferent among its choices, which is a definining property of a Nash
equilibrium, thus tightening the connection between free energy optimization
and game theory.Comment: 7 pages, 4 figures. Proceedings of AAAI-1
Automatic Curriculum Learning For Deep RL: A Short Survey
Automatic Curriculum Learning (ACL) has become a cornerstone of recent
successes in Deep Reinforcement Learning (DRL).These methods shape the learning
trajectories of agents by challenging them with tasks adapted to their
capacities. In recent years, they have been used to improve sample efficiency
and asymptotic performance, to organize exploration, to encourage
generalization or to solve sparse reward problems, among others. The ambition
of this work is dual: 1) to present a compact and accessible introduction to
the Automatic Curriculum Learning literature and 2) to draw a bigger picture of
the current state of the art in ACL to encourage the cross-breeding of existing
concepts and the emergence of new ideas.Comment: Accepted at IJCAI202
- …