288 research outputs found
Discrete Message via Online Clustering Labels in Decentralized POMDP
Communication is crucial for solving cooperative Multi-Agent Reinforcement
Learning tasks in Partially-Observable Markov Decision Processes. Existing
works often rely on black-box methods to encode local information/features into
messages shared with other agents. However, such black-box approaches are
unable to provide any quantitative guarantees on the expected return and often
lead to the generation of continuous messages with high communication overhead
and poor interpretability. In this paper, we establish an upper bound on the
return gap between an ideal policy with full observability and an optimal
partially-observable policy with discrete communication. This result enables us
to recast multi-agent communication into a novel online clustering problem over
the local observations at each agent, with messages as cluster labels and the
upper bound on the return gap as clustering loss. By minimizing the upper
bound, we propose a surprisingly simple design of message generation functions
in multi-agent communication and integrate it with reinforcement learning using
a Regularized Information Maximization loss function. Evaluations show that the
proposed discrete communication significantly outperforms state-of-the-art
multi-agent communication baselines and can achieve nearly-optimal returns with
few-bit messages that are naturally interpretable
System Optimisation for Multi-access Edge Computing Based on Deep Reinforcement Learning
Multi-access edge computing (MEC) is an emerging and important distributed computing paradigm that aims to extend cloud service to the network edge to reduce network traffic and service latency. Proper system optimisation and maintenance are crucial to maintaining high Quality-of-service (QoS) for end-users. However, with the increasing complexity of the architecture of MEC and mobile applications, effectively optimising MEC systems is non-trivial. Traditional optimisation methods are generally based on simplified mathematical models and fixed heuristics, which rely heavily on expert knowledge. As a consequence, when facing dynamic MEC scenarios, considerable human efforts and expertise are required to redesign the model and tune the heuristics, which is time-consuming.
This thesis aims to develop deep reinforcement learning (DRL) methods to handle system optimisation problems in MEC. Instead of developing fixed heuristic algorithms for the problems, this thesis aims to design DRL-based methods that enable systems to learn optimal solutions on their own. This research demonstrates the effectiveness of DRL-based methods on two crucial system optimisation problems: task offloading and service migration. Specifically, this thesis first investigate the dependent task offloading problem that considers the inner dependencies of tasks. This research builds a DRL-based method combining sequence-to-sequence (seq2seq) neural network to address the problem. Experiment results demonstrate that our method outperforms the existing heuristic algorithms and achieves near-optimal performance. To further enhance the learning efficiency of the DRL-based task offloading method for unseen learning tasks, this thesis then integrates meta reinforcement learning to handle the task offloading problem. Our method can adapt fast to new environments with a small number of gradient updates and samples. Finally, this thesis exploits the DRL-based solution for the service migration problem in MEC considering user mobility. This research models the service migration problem as a Partially Observable Markov Decision Process (POMDP) and propose a tailored actor-critic algorithm combining Long-short Term Memory (LSTM) to solve the POMDP. Results from extensive experiments based on real-world mobility traces demonstrate that our method consistently outperforms both the heuristic and state-of-the-art learning-driven algorithms on various MEC scenarios
Centralised rehearsal of decentralised cooperation: Multi-agent reinforcement learning for the scalable coordination of residential energy flexibility
This paper investigates how deep multi-agent reinforcement learning can
enable the scalable and privacy-preserving coordination of residential energy
flexibility. The coordination of distributed resources such as electric
vehicles and heating will be critical to the successful integration of large
shares of renewable energy in our electricity grid and, thus, to help mitigate
climate change. The pre-learning of individual reinforcement learning policies
can enable distributed control with no sharing of personal data required during
execution. However, previous approaches for multi-agent reinforcement
learning-based distributed energy resources coordination impose an ever greater
training computational burden as the size of the system increases. We therefore
adopt a deep multi-agent actor-critic method which uses a \emph{centralised but
factored critic} to rehearse coordination ahead of execution. Results show that
coordination is achieved at scale, with minimal information and communication
infrastructure requirements, no interference with daily activities, and privacy
protection. Significant savings are obtained for energy users, the distribution
network and greenhouse gas emissions. Moreover, training times are nearly 40
times shorter than with a previous state-of-the-art reinforcement learning
approach without the factored critic for 30 homes
TrustFed: A Reliable Federated Learning Framework with Malicious-Attack Resistance
As a key technology in 6G research, federated learning (FL) enables
collaborative learning among multiple clients while ensuring individual data
privacy. However, malicious attackers among the participating clients can
intentionally tamper with the training data or the trained model, compromising
the accuracy and trustworthiness of the system. To address this issue, in this
paper, we propose a hierarchical audit-based FL (HiAudit-FL) framework, with
the aim to enhance the reliability and security of the learning process. The
hierarchical audit process includes two stages, namely model-audit and
parameter-audit. In the model-audit stage, a low-overhead audit method is
employed to identify suspicious clients. Subsequently, in the parameter-audit
stage, a resource-consuming method is used to detect all malicious clients with
higher accuracy among the suspicious ones. Specifically, we execute the model
audit method among partial clients for multiple rounds, which is modeled as a
partial observation Markov decision process (POMDP) with the aim to enhance the
robustness and accountability of the decision-making in complex and uncertain
environments. Meanwhile, we formulate the problem of identifying malicious
attackers through a multi-round audit as an active sequential hypothesis
testing problem and leverage a diffusion model-based AI-Enabled audit selection
strategy (ASS) to decide which clients should be audited in each round. To
accomplish efficient and effective audit selection, we design a DRL-ASS
algorithm by incorporating the ASS in a deep reinforcement learning (DRL)
framework. Our simulation results demonstrate that HiAudit-FL can effectively
identify and handle potential malicious users accurately, with small system
overhead.Comment: 13 pages, 9figure
- …