1,050 research outputs found
TempLe: Learning Template of Transitions for Sample Efficient Multi-task RL
Transferring knowledge among various environments is important to efficiently
learn multiple tasks online. Most existing methods directly use the previously
learned models or previously learned optimal policies to learn new tasks.
However, these methods may be inefficient when the underlying models or optimal
policies are substantially different across tasks. In this paper, we propose
Template Learning (TempLe), the first PAC-MDP method for multi-task
reinforcement learning that could be applied to tasks with varying state/action
space. TempLe generates transition dynamics templates, abstractions of the
transition dynamics across tasks, to gain sample efficiency by extracting
similarities between tasks even when their underlying models or optimal
policies have limited commonalities. We present two algorithms for an "online"
and a "finite-model" setting respectively. We prove that our proposed TempLe
algorithms achieve much lower sample complexity than single-task learners or
state-of-the-art multi-task methods. We show via systematically designed
experiments that our TempLe method universally outperforms the state-of-the-art
multi-task methods (PAC-MDP or not) in various settings and regimes
Variation of poplar sap flow and its response to meteorological factors
The purpose of this paper was to analyze the changes of poplar sap flow, to explore
the meteorological factors affecting the changes of poplar sap flow and their response
laws, so as to provide a theoretical basis for the follow-up study on the improvement of
poplar living tree and the rising mechanism of its liquid medicine. The sap flow rate of
poplar was measured by Flow 32A-1K wrapped sap flow meter, and meteorological
factors were measured simultaneously by solar meter and temperature and humidity
meter. The results showed that there was a significant positive correlation between
poplar stem velocity and solar radiation intensity and air temperature, but a negative
correlation with air relative humidity. Therefore, the influence degree of different
meteorological factors on poplar sap flow rate is different
Rethinking Adversarial Policies: A Generalized Attack Formulation and Provable Defense in Multi-Agent RL
Most existing works consider direct perturbations of victim's state/action or
the underlying transition dynamics to show vulnerability of reinforcement
learning agents under adversarial attacks. However, such direct manipulation
may not always be feasible in practice. In this paper, we consider another
common and realistic attack setup: in a multi-agent RL setting with
well-trained agents, during deployment time, the victim agent is
exploited by an attacker who controls another agent to act
adversarially against the victim using an \textit{adversarial policy}. Prior
attack models under such setup do not consider that the attacker can confront
resistance and thus can only take partial control of the agent , as
well as introducing perceivable ``abnormal'' behaviors that are easily
detectable. A provable defense against these adversarial policies is also
lacking. To resolve these issues, we introduce a more general attack
formulation that models to what extent the adversary is able to control the
agent to produce the adversarial policy. Based on such a generalized attack
framework, the attacker can also regulate the state distribution shift caused
by the attack through an attack budget, and thus produce stealthy adversarial
policies that can exploit the victim agent. Furthermore, we provide the first
provably robust defenses with convergence guarantee to the most robust victim
policy via adversarial training with timescale separation, in sharp contrast to
adversarial training in supervised learning which may only provide {\it
empirical} defenses
Robustness to Multi-Modal Environment Uncertainty in MARL using Curriculum Learning
Multi-agent reinforcement learning (MARL) plays a pivotal role in tackling
real-world challenges. However, the seamless transition of trained policies
from simulations to real-world requires it to be robust to various
environmental uncertainties. Existing works focus on finding Nash Equilibrium
or the optimal policy under uncertainty in one environment variable (i.e.
action, state or reward). This is because a multi-agent system itself is highly
complex and unstationary. However, in real-world situation uncertainty can
occur in multiple environment variables simultaneously. This work is the first
to formulate the generalised problem of robustness to multi-modal environment
uncertainty in MARL. To this end, we propose a general robust training approach
for multi-modal uncertainty based on curriculum learning techniques. We handle
two distinct environmental uncertainty simultaneously and present extensive
results across both cooperative and competitive MARL environments,
demonstrating that our approach achieves state-of-the-art levels of robustness
Safe and Robust Multi-Agent Reinforcement Learning for Connected Autonomous Vehicles under State Perturbations
Sensing and communication technologies have enhanced learning-based decision
making methodologies for multi-agent systems such as connected autonomous
vehicles (CAV). However, most existing safe reinforcement learning based
methods assume accurate state information. It remains challenging to achieve
safety requirement under state uncertainties for CAVs, considering the noisy
sensor measurements and the vulnerability of communication channels. In this
work, we propose a Robust Multi-Agent Proximal Policy Optimization with robust
Safety Shield (SR-MAPPO) for CAVs in various driving scenarios. Both robust
MARL algorithm and control barrier function (CBF)-based safety shield are used
in our approach to cope with the perturbed or uncertain state inputs. The
robust policy is trained with a worst-case Q function regularization module
that pursues higher lower-bounded reward in the former, whereas the latter,
i.e., the robust CBF safety shield accounts for CAVs' collision-free
constraints in complicated driving scenarios with even perturbed vehicle state
information. We validate the advantages of SR-MAPPO in robustness and safety
and compare it with baselines under different driving and state perturbation
scenarios in CARLA simulator. The SR-MAPPO policy is verified to maintain
higher safety rates and efficiency (reward) when threatened by both state
perturbations and unconnected vehicles' dangerous behaviors.Comment: 6 pages, 5 figure
- …