In this paper, we investigate an unmanned aerial vehicle (UAV)-assistant
air-to-ground communication system, where multiple UAVs form a UAV-enabled
virtual antenna array (UVAA) to communicate with remote base stations by
utilizing collaborative beamforming. To improve the work efficiency of the
UVAA, we formulate a UAV-enabled collaborative beamforming multi-objective
optimization problem (UCBMOP) to simultaneously maximize the transmission rate
of the UVAA and minimize the energy consumption of all UAVs by optimizing the
positions and excitation current weights of all UAVs. This problem is
challenging because these two optimization objectives conflict with each other,
and they are non-concave to the optimization variables. Moreover, the system is
dynamic, and the cooperation among UAVs is complex, making traditional methods
take much time to compute the optimization solution for a single task. In
addition, as the task changes, the previously obtained solution will become
obsolete and invalid. To handle these issues, we leverage the multi-agent deep
reinforcement learning (MADRL) to address the UCBMOP. Specifically, we use the
heterogeneous-agent trust region policy optimization (HATRPO) as the basic
framework, and then propose an improved HATRPO algorithm, namely HATRPO-UCB,
where three techniques are introduced to enhance the performance. Simulation
results demonstrate that the proposed algorithm can learn a better strategy
compared with other methods. Moreover, extensive experiments also demonstrate
the effectiveness of the proposed techniques.Comment: This paper has been submitted to IEEE Transactions on Mobile
Computin