Search CORE

3,088 research outputs found

Learning to Schedule Communication in Multi-agent Reinforcement Learning

Author: Hostallero David
Kang Wan Ju
Kim Daewoo
Lee Taeyoung
Moon Sangwoo
Son Kyunghwan
Yi Yung
Publication venue
Publication date: 05/02/2019
Field of study

Many real-world reinforcement learning tasks require multiple agents to make sequential decisions under the agents' interaction, where well-coordinated actions among the agents are crucial to achieve the target goal better at these tasks. One way to accelerate the coordination effect is to enable multiple agents to communicate with each other in a distributed manner and behave as a group. In this paper, we study a practical scenario when (i) the communication bandwidth is limited and (ii) the agents share the communication medium so that only a restricted number of agents are able to simultaneously use the medium, as in the state-of-the-art wireless networking standards. This calls for a certain form of communication scheduling. In that regard, we propose a multi-agent deep reinforcement learning framework, called SchedNet, in which agents learn how to schedule themselves, how to encode the messages, and how to select actions based on received messages. SchedNet is capable of deciding which agents should be entitled to broadcasting their (encoded) messages, by learning the importance of each agent's partially observed information. We evaluate SchedNet against multiple baselines under two different applications, namely, cooperative communication and navigation, and predator-prey. Our experiments show a non-negligible performance gap between SchedNet and other mechanisms such as the ones without communication and with vanilla scheduling methods, e.g., round robin, ranging from 32% to 43%.Comment: Accepted in ICLR 201

arXiv.org e-Print Archive

A Multi-Agent Deep Reinforcement Learning based Spectrum Allocation Framework for D2D Communications

Author: Guo Caili
Li Zheng
Xuan Yidi
Publication venue
Publication date: 17/12/2019
Field of study

Device-to-device (D2D) communication has been recognized as a promising technique to improve spectrum efficiency. However, D2D transmission as an underlay causes severe interference, which imposes a technical challenge to spectrum allocation. Existing centralized schemes require global information, which can cause serious signaling overhead. While existing distributed solution requires frequent information exchange between users and cannot achieve global optimization. In this paper, a distributed spectrum allocation framework based on multi-agent deep reinforcement learning is proposed, named Neighbor-Agent Actor Critic (NAAC). NAAC uses neighbor users' historical information for centralized training but is executed distributedly without that information, which not only has no signal interaction during execution, but also utilizes cooperation between users to further optimize system performance. The simulation results show that the proposed framework can effectively reduce the outage probability of cellular links, improve the sum rate of D2D links and have good convergence.Comment: Accepted to Globecom 201

arXiv.org e-Print Archive

A Survey and Critique of Multiagent Deep Reinforcement Learning

Author: Hernandez-Leal Pablo
Kartal Bilal
Taylor Matthew E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/08/2019
Field of study

Deep reinforcement learning (RL) has achieved outstanding results in recent years. This has led to a dramatic increase in the number of applications and methods. Recent works have explored learning beyond single-agent scenarios and have considered multiagent learning (MAL) scenarios. Initial results report successes in complex multiagent domains, although there are several challenges to be addressed. The primary goal of this article is to provide a clear overview of current multiagent deep reinforcement learning (MDRL) literature. Additionally, we complement the overview with a broader analysis: (i) we revisit previous key components, originally presented in MAL and RL, and highlight how they have been adapted to multiagent deep reinforcement learning settings. (ii) We provide general guidelines to new practitioners in the area: describing lessons learned from MDRL works, pointing to recent benchmarks, and outlining open avenues of research. (iii) We take a more critical tone raising practical challenges of MDRL (e.g., implementation and computational demands). We expect this article will help unify and motivate future research to take advantage of the abundant literature that exists (e.g., RL and MAL) in a joint effort to promote fruitful research in the multiagent community.Comment: Under review since Oct 2018. Earlier versions of this work had the title: "Is multiagent deep reinforcement learning the answer or the question? A brief survey

arXiv.org e-Print Archive

Applications of Deep Reinforcement Learning in Communications and Networking: A Survey

Author: Gong Shimin
Hoang Dinh Thai
Kim Dong In
Liang Ying-Chang
Luong Nguyen Cong
Niyato Dusit
Wang Ping
Publication venue
Publication date: 17/10/2018
Field of study

This paper presents a comprehensive literature review on applications of deep reinforcement learning in communications and networking. Modern networks, e.g., Internet of Things (IoT) and Unmanned Aerial Vehicle (UAV) networks, become more decentralized and autonomous. In such networks, network entities need to make decisions locally to maximize the network performance under uncertainty of network environment. Reinforcement learning has been efficiently used to enable the network entities to obtain the optimal policy including, e.g., decisions or actions, given their states when the state and action spaces are small. However, in complex and large-scale networks, the state and action spaces are usually large, and the reinforcement learning may not be able to find the optimal policy in reasonable time. Therefore, deep reinforcement learning, a combination of reinforcement learning with deep learning, has been developed to overcome the shortcomings. In this survey, we first give a tutorial of deep reinforcement learning from fundamental concepts to advanced models. Then, we review deep reinforcement learning approaches proposed to address emerging issues in communications and networking. The issues include dynamic network access, data rate control, wireless caching, data offloading, network security, and connectivity preservation which are all important to next generation networks such as 5G and beyond. Furthermore, we present applications of deep reinforcement learning for traffic routing, resource sharing, and data collection. Finally, we highlight important challenges, open issues, and future research directions of applying deep reinforcement learning.Comment: 37 pages, 13 figures, 6 tables, 174 reference paper

arXiv.org e-Print Archive

Application of Machine Learning in Wireless Networks: Key Techniques and Open Issues

Author: Huang Yuzhe
Mao Shiwen
Peng Mugen
Sun Yaohua
Zhou Yangcheng
Publication venue
Publication date: 28/02/2019
Field of study

As a key technique for enabling artificial intelligence, machine learning (ML) is capable of solving complex problems without explicit programming. Motivated by its successful applications to many practical tasks like image recognition, both industry and the research community have advocated the applications of ML in wireless communication. This paper comprehensively surveys the recent advances of the applications of ML in wireless communication, which are classified as: resource management in the MAC layer, networking and mobility management in the network layer, and localization in the application layer. The applications in resource management further include power control, spectrum management, backhaul management, cache management, beamformer design and computation resource management, while ML based networking focuses on the applications in clustering, base station switching control, user association and routing. Moreover, literatures in each aspect is organized according to the adopted ML techniques. In addition, several conditions for applying ML to wireless communication are identified to help readers decide whether to use ML and which kind of ML techniques to use, and traditional approaches are also summarized together with their performance comparison with ML based approaches, based on which the motivations of surveyed literatures to adopt ML are clarified. Given the extensiveness of the research area, challenges and unresolved issues are presented to facilitate future studies, where ML based network slicing, infrastructure update to support ML based paradigms, open data sets and platforms for researchers, theoretical guidance for ML implementation and so on are discussed.Comment: 34 pages,8 figure

arXiv.org e-Print Archive

Multi-Agent Deep Reinforcement Learning for Large-scale Traffic Signal Control

Author: Chu Tianshu
Codecà Lara
Li Zhaojian
Wang Jie
Publication venue
Publication date: 11/03/2019
Field of study

Reinforcement learning (RL) is a promising data-driven approach for adaptive traffic signal control (ATSC) in complex urban traffic networks, and deep neural networks further enhance its learning power. However, centralized RL is infeasible for large-scale ATSC due to the extremely high dimension of the joint action space. Multi-agent RL (MARL) overcomes the scalability issue by distributing the global control to each local RL agent, but it introduces new challenges: now the environment becomes partially observable from the viewpoint of each local agent due to limited communication among agents. Most existing studies in MARL focus on designing efficient communication and coordination among traditional Q-learning agents. This paper presents, for the first time, a fully scalable and decentralized MARL algorithm for the state-of-the-art deep RL agent: advantage actor critic (A2C), within the context of ATSC. In particular, two methods are proposed to stabilize the learning procedure, by improving the observability and reducing the learning difficulty of each local agent. The proposed multi-agent A2C is compared against independent A2C and independent Q-learning algorithms, in both a large synthetic traffic grid and a large real-world traffic network of Monaco city, under simulated peak-hour traffic dynamics. Results demonstrate its optimality, robustness, and sample efficiency over other state-of-the-art decentralized MARL algorithms

arXiv.org e-Print Archive

A Review of Reinforcement Learning for Autonomous Building Energy Management

Author: Grijalva Santiago
Mason Karl
Publication venue
Publication date: 15/03/2019
Field of study

The area of building energy management has received a significant amount of interest in recent years. This area is concerned with combining advancements in sensor technologies, communications and advanced control algorithms to optimize energy utilization. Reinforcement learning is one of the most prominent machine learning algorithms used for control problems and has had many successful applications in the area of building energy management. This research gives a comprehensive review of the literature relating to the application of reinforcement learning to developing autonomous building energy management systems. The main direction for future research and challenges in reinforcement learning are also outlined.Comment: 17 pages, 3 figure

arXiv.org e-Print Archive

Multi-Agent Actor-Critic with Generative Cooperative Policy Network

Author: Park Jinkyoo
Ryu Heechang
Shin Hayong
Publication venue
Publication date: 22/10/2018
Field of study

We propose an efficient multi-agent reinforcement learning approach to derive equilibrium strategies for multi-agents who are participating in a Markov game. Mainly, we are focused on obtaining decentralized policies for agents to maximize the performance of a collaborative task by all the agents, which is similar to solving a decentralized Markov decision process. We propose to use two different policy networks: (1) decentralized greedy policy network used to generate greedy action during training and execution period and (2) generative cooperative policy network (GCPN) used to generate action samples to make other agents improve their objectives during training period. We show that the samples generated by GCPN enable other agents to explore the policy space more effectively and favorably to reach a better policy in terms of achieving the collaborative tasks.Comment: 10 pages, total 9 figures including all sub-figure

arXiv.org e-Print Archive

Vehicular Edge Computing via Deep Reinforcement Learning

Author: Ma Zhanyu
Qi Qi
Publication venue
Publication date: 10/02/2020
Field of study

The smart vehicles construct Vehicle of Internet which can execute various intelligent services. Although the computation capability of the vehicle is limited, multi-type of edge computing nodes provide heterogeneous resources for vehicular services.When offloading the complicated service to the vehicular edge computing node, the decision should consider numerous factors.The offloading decision work mostly formulate the decision to a resource scheduling problem with single or multiple objective function and some constraints, and explore customized heuristics algorithms. However, offloading multiple data dependency tasks in a service is a difficult decision, as an optimal solution must understand the resource requirement, the access network, the user mobility, and importantly the data dependency. Inspired by recent advances in machine learning, we propose a knowledge driven (KD) service offloading decision framework for Vehicle of Internet, which provides the optimal policy directly from the environment. We formulate the offloading decision of multi-task in a service as a long-term planning problem, and explores the recent deep reinforcement learning to obtain the optimal solution. It considers the future data dependency of the following tasks when making decision for a current task from the learned offloading knowledge. Moreover, the framework supports the pre-training at the powerful edge computing node and continually online learning when the vehicular service is executed, so that it can adapt the environment changes and learns policy that are sensible in hindsight. The simulation results show that KD service offloading decision converges quickly, adapts to different conditions, and outperforms the greedy offloading decision algorithm.Comment: Preliminary report of ongoing wor

arXiv.org e-Print Archive

Experience Augmentation: Boosting and Accelerating Off-Policy Multi-Agent Reinforcement Learning

Author: Chen Yining
Fan Shen
Song Guanghua
Yang Bowei
Ye Zhenhui
Publication venue
Publication date: 19/05/2020
Field of study

Exploration of the high-dimensional state action space is one of the biggest challenges in Reinforcement Learning (RL), especially in multi-agent domain. We present a novel technique called Experience Augmentation, which enables a time-efficient and boosted learning based on a fast, fair and thorough exploration to the environment. It can be combined with arbitrary off-policy MARL algorithms and is applicable to either homogeneous or heterogeneous environments. We demonstrate our approach by combining it with MADDPG and verifing the performance in two homogeneous and one heterogeneous environments. In the best performing scenario, the MADDPG with experience augmentation reaches to the convergence reward of vanilla MADDPG with 1/4 realistic time, and its convergence beats the original model by a significant margin. Our ablation studies show that experience augmentation is a crucial ingredient which accelerates the training process and boosts the convergence.Comment: 10 pages, 4 figures, 4 table

arXiv.org e-Print Archive