1,108 research outputs found
Implicit imitation in multiagent reinforcement learning
Imitation is actively being studied as an effective means of learning in multi-agent environments. It allows an agent to learn how to act well (perhaps optimally) by passively observing the actions of cooperative teachers or other more experienced agents its environment. We propose a straightforward imitation mechanism called model extraction that can be integrated easily into standard model-based reinforcement learning algorithms. Roughly, by observing a mentor with similar capabilities, an agent can extract information about its own capabilities in unvisited parts of state space. The extracted information can accelerate learning dramatically. We illustrate the benefits of model extraction by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent assumptions regarding observability, possible interactions and common abilities, we briefly comment on extensions of the model that relax these
Attention Loss Adjusted Prioritized Experience Replay
Prioritized Experience Replay (PER) is a technical means of deep
reinforcement learning by selecting experience samples with more knowledge
quantity to improve the training rate of neural network. However, the
non-uniform sampling used in PER inevitably shifts the state-action space
distribution and brings the estimation error of Q-value function. In this
paper, an Attention Loss Adjusted Prioritized (ALAP) Experience Replay
algorithm is proposed, which integrates the improved Self-Attention network
with Double-Sampling mechanism to fit the hyperparameter that can regulate the
importance sampling weights to eliminate the estimation error caused by PER. In
order to verify the effectiveness and generality of the algorithm, the ALAP is
tested with value-function based, policy-gradient based and multi-agent
reinforcement learning algorithms in OPENAI gym, and comparison studies verify
the advantage and efficiency of the proposed training framework
Defeating Proactive Jammers Using Deep Reinforcement Learning for Resource-Constrained IoT Networks
Traditional anti-jamming techniques like spread spectrum, adaptive power/rate
control, and cognitive radio, have demonstrated effectiveness in mitigating
jamming attacks. However, their robustness against the growing complexity of
internet-of-thing (IoT) networks and diverse jamming attacks is still limited.
To address these challenges, machine learning (ML)-based techniques have
emerged as promising solutions. By offering adaptive and intelligent
anti-jamming capabilities, ML-based approaches can effectively adapt to dynamic
attack scenarios and overcome the limitations of traditional methods. In this
paper, we propose a deep reinforcement learning (DRL)-based approach that
utilizes state input from realistic wireless network interface cards. We train
five different variants of deep Q-network (DQN) agents to mitigate the effects
of jamming with the aim of identifying the most sample-efficient, lightweight,
robust, and least complex agent that is tailored for power-constrained devices.
The simulation results demonstrate the effectiveness of the proposed DRL-based
anti-jamming approach against proactive jammers, regardless of their jamming
strategy which eliminates the need for a pattern recognition or jamming
strategy detection step. Our findings present a promising solution for securing
IoT networks against jamming attacks and highlights substantial opportunities
for continued investigation and advancement within this field
Stochastic Reinforcement Learning
In reinforcement learning episodes, the rewards and punishments are often
non-deterministic, and there are invariably stochastic elements governing the
underlying situation. Such stochastic elements are often numerous and cannot be
known in advance, and they have a tendency to obscure the underlying rewards
and punishments patterns. Indeed, if stochastic elements were absent, the same
outcome would occur every time and the learning problems involved could be
greatly simplified. In addition, in most practical situations, the cost of an
observation to receive either a reward or punishment can be significant, and
one would wish to arrive at the correct learning conclusion by incurring
minimum cost. In this paper, we present a stochastic approach to reinforcement
learning which explicitly models the variability present in the learning
environment and the cost of observation. Criteria and rules for learning
success are quantitatively analyzed, and probabilities of exceeding the
observation cost bounds are also obtained.Comment: AIKE 201
- …