88,972 research outputs found
Multi-Agent Double Deep Q-Learning for Beamforming in mmWave MIMO Networks
Beamforming is one of the key techniques in millimeter wave (mmWave)
multi-input multi-output (MIMO) communications. Designing appropriate
beamforming not only improves the quality and strength of the received signal,
but also can help reduce the interference, consequently enhancing the data
rate. In this paper, we propose a distributed multi-agent double deep
Q-learning algorithm for beamforming in mmWave MIMO networks, where multiple
base stations (BSs) can automatically and dynamically adjust their beams to
serve multiple highly-mobile user equipments (UEs). In the analysis, largest
received power association criterion is considered for UEs, and a realistic
channel model is taken into account. Simulation results demonstrate that the
proposed learning-based algorithm can achieve comparable performance with
respect to exhaustive search while operating at much lower complexity.Comment: To be published in IEEE International Symposium on Personal, Indoor
and Mobile Radio Communications (PIMRC) 202
Multi Agent DeepRL based Joint Power and Subchannel Allocation in IAB networks
Integrated Access and Backhauling (IAB) is a viable approach for meeting the
unprecedented need for higher data rates of future generations, acting as a
cost-effective alternative to dense fiber-wired links. The design of such
networks with constraints usually results in an optimization problem of
non-convex and combinatorial nature. Under those situations, it is challenging
to obtain an optimal strategy for the joint Subchannel Allocation and Power
Allocation (SAPA) problem. In this paper, we develop a multi-agent Deep
Reinforcement Learning (DeepRL) based framework for joint optimization of power
and subchannel allocation in an IAB network to maximize the downlink data rate.
SAPA using DDQN (Double Deep Q-Learning Network) can handle computationally
expensive problems with huge action spaces associated with multiple users and
nodes. Unlike the conventional methods such as game theory, fractional
programming, and convex optimization, which in practice demand more and more
accurate network information, the multi-agent DeepRL approach requires less
environment network information. Simulation results show the proposed scheme's
promising performance when compared with baseline (Deep Q-Learning Network and
Random) schemes.Comment: 7 pages, 6 figures, Accepted at the European Conference on
Communication Systems (ECCS) 202
Double Deep Q-Learning in Opponent Modeling
Multi-agent systems in which secondary agents with conflicting agendas also
alter their methods need opponent modeling. In this study, we simulate the main
agent's and secondary agents' tactics using Double Deep Q-Networks (DDQN) with
a prioritized experience replay mechanism. Then, under the opponent modeling
setup, a Mixture-of-Experts architecture is used to identify various opponent
strategy patterns. Finally, we analyze our models in two environments with
several agents. The findings indicate that the Mixture-of-Experts model, which
is based on opponent modeling, performs better than DDQN
Independent Learning Approaches: Overcoming Multi-Agent Learning Pathologies In Team-Games
Deep Neural Networks enable Reinforcement Learning (RL) agents to learn behaviour policies directly from high-dimensional observations. As a result, the field of Deep Reinforcement Learning (DRL) has seen a great number of successes. Recently the sub-field of Multi-Agent DRL (MADRL) has received an increased amount of attention. However, considerations are required when using RL in Multi-Agent Systems. For instance Independent Learners (ILs) lack the convergence guarantees of many single-agent RL approaches, even in domains that do not require a MADRL approach. Furthermore, ILs must often overcome a number of learning pathologies to converge upon an optimal joint-policy. Numerous IL approaches have been proposed to facilitate cooperation, including hysteretic Q-learning (Matignon et al., 2007) and leniency (Panait et al., 2006). Recently LMRL2, a variation of leniency, proved robust towards a number of pathologies in low-dimensional domains, including miscoordination, relative overgeneralization, stochasticity, the alter-exploration problem and the moving target problem (Wei and Luke, 2016). In contrast, the majority of work on ILs in MADRL focuses on an amplified moving target problem, caused by neural networks being trained with potentially obsolete samples drawn from experience replay memories. In this thesis we combine advances from research on ILs with DRL algorithms. However, first we evaluate the robustness of tabular approaches along each of the above pathology dimensions. Upon identifying a number of weaknesses that prevent LMRL2 from consistently converging upon optimal joint-policies we propose a new version of leniency, Distributed-Lenient Q-learning (DLQ). We find DLQ delivers state of the art performances in strategic-form and Markov games from Multi-Agent Reinforcement Learning literature. We subsequently scale leniency to MADRL, introducing Lenient (Double) Deep Q-Network (LDDQN). We empirically evaluate LDDQN with extensions of the Cooperative Multi-Agent Object Transportation Problem (Bucsoniu et al., 2010), finding that LDDQN outperforms hysteretic deep Q-learners in domains with multiple dropzones yielding stochastic rewards. Finally, to evaluate deep ILs along each pathology dimension we introduce a new MADRL environment: the Apprentice Firemen Game (AFG). We find lenient and hysteretic approaches fail to consistently learn near optimal joint-policies in the AFG. To address these pathologies we introduce Negative Update Intervals-DDQN (NUI-DDQN), a MADRL algorithm which discards episodes yielding cumulative rewards outside the range of expanding intervals. NUI-DDQN consistently gravitates towards optimal joint-policies in deterministic and stochastic reward settings of the AFG, overcoming the outlined pathologies
Competitive MA-DRL for Transmit Power Pool Design in Semi-Grant-Free NOMA Systems
In this paper, we exploit the capability of multi-agent deep reinforcement
learning (MA-DRL) technique to generate a transmit power pool (PP) for Internet
of things (IoT) networks with semi-grant-free non-orthogonal multiple access
(SGF-NOMA). The PP is mapped with each resource block (RB) to achieve
distributed transmit power control (DPC). We first formulate the resource
(sub-channel and transmit power) selection problem as stochastic Markov game,
and then solve it using two competitive MA-DRL algorithms, namely double deep Q
network (DDQN) and Dueling DDQN. Each GF user as an agent tries to find out the
optimal transmit power level and RB to form the desired PP. With the aid of
dueling processes, the learning process can be enhanced by evaluating the
valuable state without considering the effect of each action at each state.
Therefore, DDQN is designed for communication scenarios with a small-size
action-state space, while Dueling DDQN is for a large-size case. Our results
show that the proposed MA-Dueling DDQN based SGF-NOMA with DPC outperforms the
SGF-NOMA system with the fixed-power-control mechanism and networks with pure
GF protocols with 17.5% and 22.2% gain in terms of the system throughput,
respectively. Moreover, to decrease the training time, we eliminate invalid
actions (high transmit power levels) to reduce the action space. We show that
our proposed algorithm is computationally scalable to massive IoT networks.
Finally, to control the interference and guarantee the quality-of-service
requirements of grant-based users, we find the optimal number of GF users for
each sub-channel
UAV Path Planning for Wireless Data Harvesting: A Deep Reinforcement Learning Approach
Autonomous deployment of unmanned aerial vehicles (UAVs) supporting
next-generation communication networks requires efficient trajectory planning
methods. We propose a new end-to-end reinforcement learning (RL) approach to
UAV-enabled data collection from Internet of Things (IoT) devices in an urban
environment. An autonomous drone is tasked with gathering data from distributed
sensor nodes subject to limited flying time and obstacle avoidance. While
previous approaches, learning and non-learning based, must perform expensive
recomputations or relearn a behavior when important scenario parameters such as
the number of sensors, sensor positions, or maximum flying time, change, we
train a double deep Q-network (DDQN) with combined experience replay to learn a
UAV control policy that generalizes over changing scenario parameters. By
exploiting a multi-layer map of the environment fed through convolutional
network layers to the agent, we show that our proposed network architecture
enables the agent to make movement decisions for a variety of scenario
parameters that balance the data collection goal with flight time efficiency
and safety constraints. Considerable advantages in learning efficiency from
using a map centered on the UAV's position over a non-centered map are also
illustrated.Comment: Code available under
https://github.com/hbayerlein/uav_data_harvesting, IEEE Global Communications
Conference (GLOBECOM) 202
Reinforcement Learning using Augmented Neural Networks
Neural networks allow Q-learning reinforcement learning agents such as deep
Q-networks (DQN) to approximate complex mappings from state spaces to value
functions. However, this also brings drawbacks when compared to other function
approximators such as tile coding or their generalisations, radial basis
functions (RBF) because they introduce instability due to the side effect of
globalised updates present in neural networks. This instability does not even
vanish in neural networks that do not have any hidden layers. In this paper, we
show that simple modifications to the structure of the neural network can
improve stability of DQN learning when a multi-layer perceptron is used for
function approximation.Comment: 7 pages; two columns; 4 figure
- …