761 research outputs found
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Rethinking Adversarial Policies: A Generalized Attack Formulation and Provable Defense in Multi-Agent RL
Most existing works consider direct perturbations of victim's state/action or
the underlying transition dynamics to show vulnerability of reinforcement
learning agents under adversarial attacks. However, such direct manipulation
may not always be feasible in practice. In this paper, we consider another
common and realistic attack setup: in a multi-agent RL setting with
well-trained agents, during deployment time, the victim agent is
exploited by an attacker who controls another agent to act
adversarially against the victim using an \textit{adversarial policy}. Prior
attack models under such setup do not consider that the attacker can confront
resistance and thus can only take partial control of the agent , as
well as introducing perceivable ``abnormal'' behaviors that are easily
detectable. A provable defense against these adversarial policies is also
lacking. To resolve these issues, we introduce a more general attack
formulation that models to what extent the adversary is able to control the
agent to produce the adversarial policy. Based on such a generalized attack
framework, the attacker can also regulate the state distribution shift caused
by the attack through an attack budget, and thus produce stealthy adversarial
policies that can exploit the victim agent. Furthermore, we provide the first
provably robust defenses with convergence guarantee to the most robust victim
policy via adversarial training with timescale separation, in sharp contrast to
adversarial training in supervised learning which may only provide {\it
empirical} defenses
Policy Space Diversity for Non-Transitive Games
Policy-Space Response Oracles (PSRO) is an influential algorithm framework
for approximating a Nash Equilibrium (NE) in multi-agent non-transitive games.
Many previous studies have been trying to promote policy diversity in PSRO. A
major weakness in existing diversity metrics is that a more diverse (according
to their diversity metrics) population does not necessarily mean (as we proved
in the paper) a better approximation to a NE. To alleviate this problem, we
propose a new diversity metric, the improvement of which guarantees a better
approximation to a NE. Meanwhile, we develop a practical and well-justified
method to optimize our diversity metric using only state-action samples. By
incorporating our diversity regularization into the best response solving in
PSRO, we obtain a new PSRO variant, Policy Space Diversity PSRO (PSD-PSRO). We
present the convergence property of PSD-PSRO. Empirically, extensive
experiments on various games demonstrate that PSD-PSRO is more effective in
producing significantly less exploitable policies than state-of-the-art PSRO
variants
On Tilted Losses in Machine Learning: Theory and Applications
Exponential tilting is a technique commonly used in fields such as
statistics, probability, information theory, and optimization to create
parametric distribution shifts. Despite its prevalence in related fields,
tilting has not seen widespread use in machine learning. In this work, we aim
to bridge this gap by exploring the use of tilting in risk minimization. We
study a simple extension to ERM -- tilted empirical risk minimization (TERM) --
which uses exponential tilting to flexibly tune the impact of individual
losses. The resulting framework has several useful properties: We show that
TERM can increase or decrease the influence of outliers, respectively, to
enable fairness or robustness; has variance-reduction properties that can
benefit generalization; and can be viewed as a smooth approximation to the tail
probability of losses. Our work makes rigorous connections between TERM and
related objectives, such as Value-at-Risk, Conditional Value-at-Risk, and
distributionally robust optimization (DRO). We develop batch and stochastic
first-order optimization methods for solving TERM, provide convergence
guarantees for the solvers, and show that the framework can be efficiently
solved relative to common alternatives. Finally, we demonstrate that TERM can
be used for a multitude of applications in machine learning, such as enforcing
fairness between subgroups, mitigating the effect of outliers, and handling
class imbalance. Despite the straightforward modification TERM makes to
traditional ERM objectives, we find that the framework can consistently
outperform ERM and deliver competitive performance with state-of-the-art,
problem-specific approaches.Comment: arXiv admin note: substantial text overlap with arXiv:2007.0116
Ditransitives in germanic languages. Synchronic and diachronic aspects
This volume brings together twelve empirical studies on ditransitive constructions in Germanic languages and their varieties, past and present. Specifically, the volume includes contributions on a wide variety of Germanic languages, including English, Dutch, and German, but also Danish, Swedish, and Norwegian, as well as lesser-studied ones such as Faroese. While the first part of the volume focuses on diachronic aspects, the second part showcases a variety of synchronic aspects relating to ditransitive patterns. Methodologically, the volume covers both experimental and corpus-based studies. Questions addressed by the papers in the volume are, among others, issues like the cross-linguistic pervasiveness and cognitive reality of factors involved in the choice between different ditransitive constructions, or differences and similarities in the diachronic development of ditransitives. The volume’s broad scope and comparative perspective offers comprehensive insights into well-known phenomena and furthers our understanding of variation across languages of the same family
Oracle-free Reinforcement Learning in Mean-Field Games along a Single Sample Path
We consider online reinforcement learning in Mean-Field Games (MFGs). Unlike
traditional approaches, we alleviate the need for a mean-field oracle by
developing an algorithm that approximates the Mean-Field Equilibrium (MFE)
using the single sample path of the generic agent. We call this {\it Sandbox
Learning}, as it can be used as a warm-start for any agent learning in a
multi-agent non-cooperative setting. We adopt a two time-scale approach in
which an online fixed-point recursion for the mean-field operates on a slower
time-scale, in tandem with a control policy update on a faster time-scale for
the generic agent. Given that the underlying Markov Decision Process (MDP) of
the agent is communicating, we provide finite sample convergence guarantees in
terms of convergence of the mean-field and control policy to the mean-field
equilibrium. The sample complexity of the Sandbox learning algorithm is
where is the MFE approximation
error. This is similar to works which assume access to oracle. Finally, we
empirically demonstrate the effectiveness of the sandbox learning algorithm in
diverse scenarios, including those where the MDP does not necessarily have a
single communicating class.Comment: Accepted for publication in AISTATS 202
Mean-field games among teams
In this paper, we present a model of a game among teams. Each team consists
of a homogeneous population of agents. Agents within a team are cooperative
while the teams compete with other teams. The dynamics and the costs are
coupled through the empirical distribution (or the mean field) of the state of
agents in each team. This mean-field is assumed to be observed by all agents.
Agents have asymmetric information (also called a non-classical information
structure). We propose a mean-field based refinement of the Team-Nash
equilibrium of the game, which we call mean-field Markov perfect equilibrium
(MF-MPE). We identify a dynamic programming decomposition to characterize
MF-MPE. We then consider the case where each team has a large number of players
and present a mean-field approximation which approximates the game among
large-population teams as a game among infinite-population teams. We show that
MF-MPE of the game among teams of infinite population is easier to compute and
is an -approximate MF-MPE of the game among teams of finite
population.Comment: 20 page
LIPIcs, Volume 261, ICALP 2023, Complete Volume
LIPIcs, Volume 261, ICALP 2023, Complete Volum
Projection-Free Methods for Solving Nonconvex-Concave Saddle Point Problems
In this paper, we investigate a class of constrained saddle point (SP)
problems where the objective function is nonconvex-concave and smooth. This
class of problems has wide applicability in machine learning, including robust
multi-class classification and dictionary learning. Several projection-based
primal-dual methods have been developed for tackling this problem; however, the
availability of methods with projection-free oracles remains limited. To
address this gap, we propose efficient single-loop projection-free methods
reliant on first-order information. In particular, using regularization and
nested approximation techniques, we propose a primal-dual conditional gradient
method that solely employs linear minimization oracles to handle constraints.
Assuming that the constraint set in the maximization is strongly convex, our
method achieves an -stationary solution within
iterations. When the projection onto the
constraint set of maximization is easy to compute, we propose a one-sided
projection-free method that achieves an -stationary solution within
iterations. Moreover, we present improved
iteration complexities of our methods under a strong concavity assumption. To
the best of our knowledge, our proposed algorithms are among the first
projection-free methods with convergence guarantees for solving
nonconvex-concave SP problems.Comment: Additional experiments have been adde
A regularized variance-reduced modified extragradient method for stochastic hierarchical games
The theory of learning in games has so far focused mainly on games with
simultaneous moves. Recently, researchers in machine learning have started
investigating learning dynamics in games involving hierarchical
decision-making. We consider an -player hierarchical game in which the th
player's objective comprises of an expectation-valued term, parametrized by
rival decisions, and a hierarchical term. Such a framework allows for capturing
a broad range of stochastic hierarchical optimization problems, Stackelberg
equilibrium problems, and leader-follower games. We develop an iteratively
regularized and smoothed variance-reduced modified extragradient framework for
learning hierarchical equilibria in a stochastic setting. We equip our analysis
with rate statements, complexity guarantees, and almost-sure convergence
claims. We then extend these statements to settings where the lower-level
problem is solved inexactly and provide the corresponding rate and complexity
statements
- …