282 research outputs found
Adversarial jamming attacks and defense strategies via adaptive deep reinforcement learning
As the applications of deep reinforcement learning (DRL) in wireless
communications grow, sensitivity of DRL based wireless communication strategies
against adversarial attacks has started to draw increasing attention. In order
to address such sensitivity and alleviate the resulting security concerns, we
in this paper consider a victim user that performs DRL-based dynamic channel
access, and an attacker that executes DRLbased jamming attacks to disrupt the
victim. Hence, both the victim and attacker are DRL agents and can interact
with each other, retrain their models, and adapt to opponents' policies. In
this setting, we initially develop an adversarial jamming attack policy that
aims at minimizing the accuracy of victim's decision making on dynamic channel
access. Subsequently, we devise defense strategies against such an attacker,
and propose three defense strategies, namely diversified defense with
proportional-integral-derivative (PID) control, diversified defense with an
imitation attacker, and defense via orthogonal policies. We design these
strategies to maximize the attacked victim's accuracy and evaluate their
performances.Comment: 13 pages, 24 figure
One Risk to Rule Them All: Addressing Distributional Shift in Offline Reinforcement Learning via Risk-Aversion
Offline reinforcement learning (RL) is suitable for safety-critical domains
where online exploration is not feasible. In such domains, decision-making
should take into consideration the risk of catastrophic outcomes. In other
words, decision-making should be risk-averse. An additional challenge of
offline RL is avoiding distributional shift, i.e. ensuring that state-action
pairs visited by the policy remain near those in the dataset. Previous works on
risk in offline RL combine offline RL techniques (to avoid distributional
shift), with risk-sensitive RL algorithms (to achieve risk-aversion). In this
work, we propose risk-aversion as a mechanism to jointly address both of these
issues. We propose a model-based approach, and use an ensemble of models to
estimate epistemic uncertainty, in addition to aleatoric uncertainty. We train
a policy that is risk-averse, and avoids high uncertainty actions.
Risk-aversion to epistemic uncertainty prevents distributional shift, as areas
not covered by the dataset have high epistemic uncertainty. Risk-aversion to
aleatoric uncertainty discourages actions that are inherently risky due to
environment stochasticity. Thus, by only introducing risk-aversion, we avoid
distributional shift in addition to achieving risk-aversion to aleatoric risk.
Our algorithm, 1R2R, achieves strong performance on deterministic benchmarks,
and outperforms existing approaches for risk-sensitive objectives in stochastic
domains
- …