7,415 research outputs found
Natural and Technological Hazards in Urban Areas
Natural hazard events and technological accidents are separate causes of environmental impacts. Natural hazards are physical phenomena active in geological times, whereas technological hazards result from actions or facilities created by humans. In our time, combined natural and man-made hazards have been induced. Overpopulation and urban development in areas prone to natural hazards increase the impact of natural disasters worldwide. Additionally, urban areas are frequently characterized by intense industrial activity and rapid, poorly planned growth that threatens the environment and degrades the quality of life. Therefore, proper urban planning is crucial to minimize fatalities and reduce the environmental and economic impacts that accompany both natural and technological hazardous events
LIPIcs, Volume 251, ITCS 2023, Complete Volume
LIPIcs, Volume 251, ITCS 2023, Complete Volum
Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning
Stackelberg equilibria arise naturally in a range of popular learning
problems, such as in security games or indirect mechanism design, and have
received increasing attention in the reinforcement learning literature. We
present a general framework for implementing Stackelberg equilibria search as a
multi-agent RL problem, allowing a wide range of algorithmic design choices. We
discuss how previous approaches can be seen as specific instantiations of this
framework. As a key insight, we note that the design space allows for
approaches not previously seen in the literature, for instance by leveraging
multitask and meta-RL techniques for follower convergence. We propose one such
approach using contextual policies, and evaluate it experimentally on both
standard and novel benchmark domains, showing greatly improved sample
efficiency compared to previous approaches. Finally, we explore the effect of
adopting algorithm designs outside the borders of our framework
For One and All: Individual and Group Fairness in the Allocation of Indivisible Goods
Fair allocation of indivisible goods is a well-explored problem.
Traditionally, research focused on individual fairness - are individual agents
satisfied with their allotted share? - and group fairness - are groups of
agents treated fairly? In this paper, we explore the coexistence of individual
envy-freeness (i-EF) and its group counterpart, group weighted envy-freeness
(g-WEF), in the allocation of indivisible goods. We propose several
polynomial-time algorithms that provably achieve i-EF and g-WEF simultaneously
in various degrees of approximation under three different conditions on the
agents' (i) when agents have identical additive valuation functions, i-EFX and
i-WEF1 can be achieved simultaneously; (ii) when agents within a group share a
common valuation function, an allocation satisfying both i-EF1 and g-WEF1
exists; and (iii) when agents' valuations for goods within a group differ, we
show that while maintaining i-EF1, we can achieve a 1/3-approximation to
ex-ante g-WEF1. Our results thus provide a first step towards connecting
individual and group fairness in the allocation of indivisible goods, in hopes
of its useful application to domains requiring the reconciliation of diversity
with individual demands.Comment: Appears in the 22nd International Conference on Autonomous Agents and
Multiagent Systems (AAMAS), 202
Data-assisted modeling of complex chemical and biological systems
Complex systems are abundant in chemistry and biology; they can be multiscale, possibly high-dimensional or stochastic, with nonlinear dynamics and interacting components. It is often nontrivial (and sometimes impossible), to determine and study the macroscopic quantities of interest and the equations they obey. One can only (judiciously or randomly) probe the system, gather observations and study trends. In this thesis, Machine Learning is used as a complement to traditional modeling and numerical methods to enable data-assisted (or data-driven) dynamical systems. As case studies, three complex systems are sourced from diverse fields: The first one is a high-dimensional computational neuroscience model of the Suprachiasmatic Nucleus of the human brain, where bifurcation analysis is performed by simply probing the system. Then, manifold learning is employed to discover a latent space of neuronal heterogeneity. Second, Machine Learning surrogate models are used to optimize dynamically operated catalytic reactors. An algorithmic pipeline is presented through which it is possible to program catalysts with active learning. Third, Machine Learning is employed to extract laws of Partial Differential Equations describing bacterial Chemotaxis. It is demonstrated how Machine Learning manages to capture the rules of bacterial motility in the macroscopic level, starting from diverse data sources (including real-world experimental data). More importantly, a framework is constructed though which already existing, partial knowledge of the system can be exploited. These applications showcase how Machine Learning can be used synergistically with traditional simulations in different scenarios: (i) Equations are available but the overall system is so high-dimensional that efficiency and explainability suffer, (ii) Equations are available but lead to highly nonlinear black-box responses, (iii) Only data are available (of varying source and quality) and equations need to be discovered. For such data-assisted dynamical systems, we can perform fundamental tasks, such as integration, steady-state location, continuation and optimization. This work aims to unify traditional scientific computing and Machine Learning, in an efficient, data-economical, generalizable way, where both the physical system and the algorithm matter
Rethinking Adversarial Policies: A Generalized Attack Formulation and Provable Defense in Multi-Agent RL
Most existing works consider direct perturbations of victim's state/action or
the underlying transition dynamics to show vulnerability of reinforcement
learning agents under adversarial attacks. However, such direct manipulation
may not always be feasible in practice. In this paper, we consider another
common and realistic attack setup: in a multi-agent RL setting with
well-trained agents, during deployment time, the victim agent is
exploited by an attacker who controls another agent to act
adversarially against the victim using an \textit{adversarial policy}. Prior
attack models under such setup do not consider that the attacker can confront
resistance and thus can only take partial control of the agent , as
well as introducing perceivable ``abnormal'' behaviors that are easily
detectable. A provable defense against these adversarial policies is also
lacking. To resolve these issues, we introduce a more general attack
formulation that models to what extent the adversary is able to control the
agent to produce the adversarial policy. Based on such a generalized attack
framework, the attacker can also regulate the state distribution shift caused
by the attack through an attack budget, and thus produce stealthy adversarial
policies that can exploit the victim agent. Furthermore, we provide the first
provably robust defenses with convergence guarantee to the most robust victim
policy via adversarial training with timescale separation, in sharp contrast to
adversarial training in supervised learning which may only provide {\it
empirical} defenses
Online Game with Time-Varying Coupled Inequality Constraints
In this paper, online game is studied, where at each time, a group of players
aim at selfishly minimizing their own time-varying cost function simultaneously
subject to time-varying coupled constraints and local feasible set constraints.
Only local cost functions and local constraints are available to individual
players, who can share limited information with their neighbors through a fixed
and connected graph. In addition, players have no prior knowledge of future
cost functions and future local constraint functions. In this setting, a novel
decentralized online learning algorithm is devised based on mirror descent and
a primal-dual strategy. The proposed algorithm can achieve sublinearly bounded
regrets and constraint violation by appropriately choosing decaying stepsizes.
Furthermore, it is shown that the generated sequence of play by the designed
algorithm can converge to the variational GNE of a strongly monotone game, to
which the online game converges. Additionally, a payoff-based case, i.e., in a
bandit feedback setting, is also considered and a new payoff-based learning
policy is devised to generate sublinear regrets and constraint violation.
Finally, the obtained theoretical results are corroborated by numerical
simulations.Comment: arXiv admin note: text overlap with arXiv:2105.0620
Policy Space Diversity for Non-Transitive Games
Policy-Space Response Oracles (PSRO) is an influential algorithm framework
for approximating a Nash Equilibrium (NE) in multi-agent non-transitive games.
Many previous studies have been trying to promote policy diversity in PSRO. A
major weakness in existing diversity metrics is that a more diverse (according
to their diversity metrics) population does not necessarily mean (as we proved
in the paper) a better approximation to a NE. To alleviate this problem, we
propose a new diversity metric, the improvement of which guarantees a better
approximation to a NE. Meanwhile, we develop a practical and well-justified
method to optimize our diversity metric using only state-action samples. By
incorporating our diversity regularization into the best response solving in
PSRO, we obtain a new PSRO variant, Policy Space Diversity PSRO (PSD-PSRO). We
present the convergence property of PSD-PSRO. Empirically, extensive
experiments on various games demonstrate that PSD-PSRO is more effective in
producing significantly less exploitable policies than state-of-the-art PSRO
variants
Reinforcement learning in large state action spaces
Reinforcement learning (RL) is a promising framework for training intelligent agents which learn to optimize long term utility by directly interacting with the environment. Creating RL methods which scale to large state-action spaces is a critical problem towards ensuring real world deployment of RL systems. However, several challenges limit the applicability of RL to large scale settings. These include difficulties with exploration, low sample efficiency, computational intractability, task constraints like decentralization and lack of guarantees about important properties like performance, generalization and robustness in potentially unseen scenarios.
This thesis is motivated towards bridging the aforementioned gap. We propose several principled algorithms and frameworks for studying and addressing the above challenges RL. The proposed methods cover a wide range of RL settings (single and multi-agent systems (MAS) with all the variations in the latter, prediction and control, model-based and model-free methods, value-based and policy-based methods). In this work we propose the first results on several different problems: e.g. tensorization of the Bellman equation which allows exponential sample efficiency gains (Chapter 4), provable suboptimality arising from structural constraints in MAS(Chapter 3), combinatorial generalization results in cooperative MAS(Chapter 5), generalization results on observation shifts(Chapter 7), learning deterministic policies in a probabilistic RL framework(Chapter 6). Our algorithms exhibit provably enhanced performance and sample efficiency along with better scalability. Additionally, we also shed light on generalization aspects of the agents under different frameworks. These properties have been been driven by the use of several advanced tools (e.g. statistical machine learning, state abstraction, variational inference, tensor theory).
In summary, the contributions in this thesis significantly advance progress towards making RL agents ready for large scale, real world applications
- …