17 research outputs found
UAV Path Planning for Wireless Data Harvesting: A Deep Reinforcement Learning Approach
Autonomous deployment of unmanned aerial vehicles (UAVs) supporting
next-generation communication networks requires efficient trajectory planning
methods. We propose a new end-to-end reinforcement learning (RL) approach to
UAV-enabled data collection from Internet of Things (IoT) devices in an urban
environment. An autonomous drone is tasked with gathering data from distributed
sensor nodes subject to limited flying time and obstacle avoidance. While
previous approaches, learning and non-learning based, must perform expensive
recomputations or relearn a behavior when important scenario parameters such as
the number of sensors, sensor positions, or maximum flying time, change, we
train a double deep Q-network (DDQN) with combined experience replay to learn a
UAV control policy that generalizes over changing scenario parameters. By
exploiting a multi-layer map of the environment fed through convolutional
network layers to the agent, we show that our proposed network architecture
enables the agent to make movement decisions for a variety of scenario
parameters that balance the data collection goal with flight time efficiency
and safety constraints. Considerable advantages in learning efficiency from
using a map centered on the UAV's position over a non-centered map are also
illustrated.Comment: Code available under
https://github.com/hbayerlein/uav_data_harvesting, IEEE Global Communications
Conference (GLOBECOM) 202
Learning to Recharge: UAV Coverage Path Planning through Deep Reinforcement Learning
Coverage path planning (CPP) is a critical problem in robotics, where the
goal is to find an efficient path that covers every point in an area of
interest. This work addresses the power-constrained CPP problem with recharge
for battery-limited unmanned aerial vehicles (UAVs). In this problem, a notable
challenge emerges from integrating recharge journeys into the overall coverage
strategy, highlighting the intricate task of making strategic, long-term
decisions. We propose a novel proximal policy optimization (PPO)-based deep
reinforcement learning (DRL) approach with map-based observations, utilizing
action masking and discount factor scheduling to optimize coverage trajectories
over the entire mission horizon. We further provide the agent with a position
history to handle emergent state loops caused by the recharge capability. Our
approach outperforms a baseline heuristic, generalizes to different target
zones and maps, with limited generalization to unseen maps. We offer valuable
insights into DRL algorithm design for long-horizon problems and provide a
publicly available software framework for the CPP problem.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl
Model-aided Federated Reinforcement Learning for Multi-UAV Trajectory Planning in IoT Networks
Deploying teams of cooperative unmanned aerial vehicles (UAVs) to harvest
data from distributed Internet of Things (IoT) devices requires efficient
trajectory planning and coordination algorithms. Multi-agent reinforcement
learning (MARL) has emerged as an effective solution, but often requires
extensive and costly real-world training data. In this paper, we propose a
novel model-aided federated MARL algorithm to coordinate multiple UAVs on a
data harvesting mission with limited knowledge about the environment,
significantly reducing the real-world training data demand. The proposed
algorithm alternates between learning an environment model from real-world
measurements and federated QMIX training in the simulated environment.
Specifically, collected measurements from the real-world environment are used
to learn the radio channel and estimate unknown IoT device locations to create
a simulated environment. Each UAV agent trains a local QMIX model in its
simulated environment and continuously consolidates it through federated
learning with other agents, accelerating the learning process and further
improving training sample efficiency. Simulation results demonstrate that our
proposed model-aided FedQMIX algorithm substantially reduces the need for
real-world training experiences while attaining similar data collection
performance as standard MARL algorithms.Comment: 7 pages, 2 figure
Learning to rest: A Q-learning approach to flying base station trajectory design with landing spots
Méthodes d’apprentissage automatique pour l’utilisation des drones dans les réseaux sans-fil
Autonomous unmanned aerial vehicles (UAVs), spurred by rapid innovation in drone hardware and regulatory frameworks during the last decade, are envisioned for a multitude of applications in service of the society of the future. From the perspective of next-generation wireless networks, UAVs are not only anticipated in the role of passive cellular-connected users, but also as active enablers of connectivity as part of UAV-aided networks. The defining advantage of UAVs in all potential application scenarios is their mobility. To take full advantage of their capabilities, flexible and efficient path planning methods are necessary. This thesis focuses on exploring machine learning (ML), specifically reinforcement learning (RL), as a promising class of solutions to UAV mobility management challenges. Deep RL is one of the few frameworks that allows us to tackle the complex task of UAV control and deployment in communication scenarios directly, given that these are generally NP-hard optimization problems and badly affected by non-convexity. Furthermore, deep RL offers the possibility to balance multiple objectives of UAV-aided networks in a straightforward way, it is very flexible in terms of the availability of prior or model information, while deep RL inference is computationally efficient. This thesis also explores the challenges of severely limited flying time, cooperation between multiple UAVs, and reducing the training data demand of DRL methods. The thesis also explores the connection between drone-assisted networks and robotics, two generally disjoint research communities.Les drones autonomes sont envisagés pour une multitude d'applications au service de la société du futur. Du point de vue des réseaux sans-fil de la prochaine génération, les drones ne sont pas seulement prévus dans le rôle d'utilisateurs passifs connectés au réseau cellulaire, mais aussi comme facilitateurs actifs de la connectivité dans le cadre de réseaux assistés par drones. L'avantage déterminant des drones dans tous les scénarios d'application potentiels est leur mobilité. Pour tirer pleinement parti de leurs capacités, des méthodes de planification de trajectoire flexibles et efficaces sont une nécessité impérative. Cette thèse se concentre sur l'exploration de l'apprentissage automatique, en particulier l'apprentissage par renforcement (RL), comme une classe prometteuse de solutions aux défis de la gestion de la mobilité des drones. L'apprentissage par renforcement profond est l'un des rares cadres qui nous permet de nous attaquer directement à la tâche complexe du contrôle des drones dans les scénarios de communication, étant donné qu'il s'agit généralement de problèmes d'optimisation non convexes et NP-difficile. De plus, le RL profond offre la possibilité d'équilibrer les objectifs multiples de manière directe, il est très flexible en termes de disponibilité d'informations préalables ou de modèles, tandis que l'inférence RL profonde est efficace sur le plan informatique. Cette thèse explore également les défis que représentent un temps de vol fortement limité, la coopération entre plusieurs drones et la réduction de la demande de données d'entraînement. La thèse explore aussi la connexion entre les réseaux assistés par drone et la robotique