88,972 research outputs found

    Multi-Agent Double Deep Q-Learning for Beamforming in mmWave MIMO Networks

    Full text link
    Beamforming is one of the key techniques in millimeter wave (mmWave) multi-input multi-output (MIMO) communications. Designing appropriate beamforming not only improves the quality and strength of the received signal, but also can help reduce the interference, consequently enhancing the data rate. In this paper, we propose a distributed multi-agent double deep Q-learning algorithm for beamforming in mmWave MIMO networks, where multiple base stations (BSs) can automatically and dynamically adjust their beams to serve multiple highly-mobile user equipments (UEs). In the analysis, largest received power association criterion is considered for UEs, and a realistic channel model is taken into account. Simulation results demonstrate that the proposed learning-based algorithm can achieve comparable performance with respect to exhaustive search while operating at much lower complexity.Comment: To be published in IEEE International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC) 202

    Multi Agent DeepRL based Joint Power and Subchannel Allocation in IAB networks

    Full text link
    Integrated Access and Backhauling (IAB) is a viable approach for meeting the unprecedented need for higher data rates of future generations, acting as a cost-effective alternative to dense fiber-wired links. The design of such networks with constraints usually results in an optimization problem of non-convex and combinatorial nature. Under those situations, it is challenging to obtain an optimal strategy for the joint Subchannel Allocation and Power Allocation (SAPA) problem. In this paper, we develop a multi-agent Deep Reinforcement Learning (DeepRL) based framework for joint optimization of power and subchannel allocation in an IAB network to maximize the downlink data rate. SAPA using DDQN (Double Deep Q-Learning Network) can handle computationally expensive problems with huge action spaces associated with multiple users and nodes. Unlike the conventional methods such as game theory, fractional programming, and convex optimization, which in practice demand more and more accurate network information, the multi-agent DeepRL approach requires less environment network information. Simulation results show the proposed scheme's promising performance when compared with baseline (Deep Q-Learning Network and Random) schemes.Comment: 7 pages, 6 figures, Accepted at the European Conference on Communication Systems (ECCS) 202

    Double Deep Q-Learning in Opponent Modeling

    Full text link
    Multi-agent systems in which secondary agents with conflicting agendas also alter their methods need opponent modeling. In this study, we simulate the main agent's and secondary agents' tactics using Double Deep Q-Networks (DDQN) with a prioritized experience replay mechanism. Then, under the opponent modeling setup, a Mixture-of-Experts architecture is used to identify various opponent strategy patterns. Finally, we analyze our models in two environments with several agents. The findings indicate that the Mixture-of-Experts model, which is based on opponent modeling, performs better than DDQN

    Independent Learning Approaches: Overcoming Multi-Agent Learning Pathologies In Team-Games

    Get PDF
    Deep Neural Networks enable Reinforcement Learning (RL) agents to learn behaviour policies directly from high-dimensional observations. As a result, the field of Deep Reinforcement Learning (DRL) has seen a great number of successes. Recently the sub-field of Multi-Agent DRL (MADRL) has received an increased amount of attention. However, considerations are required when using RL in Multi-Agent Systems. For instance Independent Learners (ILs) lack the convergence guarantees of many single-agent RL approaches, even in domains that do not require a MADRL approach. Furthermore, ILs must often overcome a number of learning pathologies to converge upon an optimal joint-policy. Numerous IL approaches have been proposed to facilitate cooperation, including hysteretic Q-learning (Matignon et al., 2007) and leniency (Panait et al., 2006). Recently LMRL2, a variation of leniency, proved robust towards a number of pathologies in low-dimensional domains, including miscoordination, relative overgeneralization, stochasticity, the alter-exploration problem and the moving target problem (Wei and Luke, 2016). In contrast, the majority of work on ILs in MADRL focuses on an amplified moving target problem, caused by neural networks being trained with potentially obsolete samples drawn from experience replay memories. In this thesis we combine advances from research on ILs with DRL algorithms. However, first we evaluate the robustness of tabular approaches along each of the above pathology dimensions. Upon identifying a number of weaknesses that prevent LMRL2 from consistently converging upon optimal joint-policies we propose a new version of leniency, Distributed-Lenient Q-learning (DLQ). We find DLQ delivers state of the art performances in strategic-form and Markov games from Multi-Agent Reinforcement Learning literature. We subsequently scale leniency to MADRL, introducing Lenient (Double) Deep Q-Network (LDDQN). We empirically evaluate LDDQN with extensions of the Cooperative Multi-Agent Object Transportation Problem (Bucsoniu et al., 2010), finding that LDDQN outperforms hysteretic deep Q-learners in domains with multiple dropzones yielding stochastic rewards. Finally, to evaluate deep ILs along each pathology dimension we introduce a new MADRL environment: the Apprentice Firemen Game (AFG). We find lenient and hysteretic approaches fail to consistently learn near optimal joint-policies in the AFG. To address these pathologies we introduce Negative Update Intervals-DDQN (NUI-DDQN), a MADRL algorithm which discards episodes yielding cumulative rewards outside the range of expanding intervals. NUI-DDQN consistently gravitates towards optimal joint-policies in deterministic and stochastic reward settings of the AFG, overcoming the outlined pathologies

    Competitive MA-DRL for Transmit Power Pool Design in Semi-Grant-Free NOMA Systems

    Full text link
    In this paper, we exploit the capability of multi-agent deep reinforcement learning (MA-DRL) technique to generate a transmit power pool (PP) for Internet of things (IoT) networks with semi-grant-free non-orthogonal multiple access (SGF-NOMA). The PP is mapped with each resource block (RB) to achieve distributed transmit power control (DPC). We first formulate the resource (sub-channel and transmit power) selection problem as stochastic Markov game, and then solve it using two competitive MA-DRL algorithms, namely double deep Q network (DDQN) and Dueling DDQN. Each GF user as an agent tries to find out the optimal transmit power level and RB to form the desired PP. With the aid of dueling processes, the learning process can be enhanced by evaluating the valuable state without considering the effect of each action at each state. Therefore, DDQN is designed for communication scenarios with a small-size action-state space, while Dueling DDQN is for a large-size case. Our results show that the proposed MA-Dueling DDQN based SGF-NOMA with DPC outperforms the SGF-NOMA system with the fixed-power-control mechanism and networks with pure GF protocols with 17.5% and 22.2% gain in terms of the system throughput, respectively. Moreover, to decrease the training time, we eliminate invalid actions (high transmit power levels) to reduce the action space. We show that our proposed algorithm is computationally scalable to massive IoT networks. Finally, to control the interference and guarantee the quality-of-service requirements of grant-based users, we find the optimal number of GF users for each sub-channel

    UAV Path Planning for Wireless Data Harvesting: A Deep Reinforcement Learning Approach

    Full text link
    Autonomous deployment of unmanned aerial vehicles (UAVs) supporting next-generation communication networks requires efficient trajectory planning methods. We propose a new end-to-end reinforcement learning (RL) approach to UAV-enabled data collection from Internet of Things (IoT) devices in an urban environment. An autonomous drone is tasked with gathering data from distributed sensor nodes subject to limited flying time and obstacle avoidance. While previous approaches, learning and non-learning based, must perform expensive recomputations or relearn a behavior when important scenario parameters such as the number of sensors, sensor positions, or maximum flying time, change, we train a double deep Q-network (DDQN) with combined experience replay to learn a UAV control policy that generalizes over changing scenario parameters. By exploiting a multi-layer map of the environment fed through convolutional network layers to the agent, we show that our proposed network architecture enables the agent to make movement decisions for a variety of scenario parameters that balance the data collection goal with flight time efficiency and safety constraints. Considerable advantages in learning efficiency from using a map centered on the UAV's position over a non-centered map are also illustrated.Comment: Code available under https://github.com/hbayerlein/uav_data_harvesting, IEEE Global Communications Conference (GLOBECOM) 202

    Reinforcement Learning using Augmented Neural Networks

    Full text link
    Neural networks allow Q-learning reinforcement learning agents such as deep Q-networks (DQN) to approximate complex mappings from state spaces to value functions. However, this also brings drawbacks when compared to other function approximators such as tile coding or their generalisations, radial basis functions (RBF) because they introduce instability due to the side effect of globalised updates present in neural networks. This instability does not even vanish in neural networks that do not have any hidden layers. In this paper, we show that simple modifications to the structure of the neural network can improve stability of DQN learning when a multi-layer perceptron is used for function approximation.Comment: 7 pages; two columns; 4 figure
    • …
    corecore