Reinforcement learning-based autonomous robot navigation and tracking

Abstract

Autonomous navigation requires determining a collision-free path for a mobile robot using only partial observations of the environment. This capability is highly needed for a wide range of applications, such as search and rescue operations, surveillance, environmental monitoring, and domestic service robots. In many scenarios, an accurate global map is not available beforehand, posing significant challenges for a robot planning its path. This type of navigation is often referred to as Mapless Navigation, and such work is not limited to only Unmanned Ground Vehicle (UGV) but also other vehicles, such as Unmanned Aerial Vehicles (UAV) and more. This research aims to develop Reinforcement Learning (RL)-based methods for autonomous navigation for mobile robots, as well as effective tracking strategies for a UAV to follow a moving target. Mapless navigation usually assumes accurate localisation, which is unrealistic. In the real world, localisation methods, such as simultaneous localisation and mapping (SLAM), are needed. However, the localisation performance could deteriorate depending on the environment and observation quality. Therefore, To avoid de-teriorated localisation, this work introduces an RL-based navigation algorithm to enable mobile robots to navigate in unknown environments, while incorporating localisation performance in training the policy. Specifically, a localisation-related penalty is introduced in the reward space, ensuring localisation safety is taken into consideration during navigation. Different metrics are formulated to identify if the localisation performance starts to deteriorate in order to penalise the robot. As such, the navigation policy will not only optimise its paths in terms of travel distance and collision avoidance towards the goal but also avoid venturing into areas that pose challenges for localisation algorithms. The localisation-safe algorithm is further extended to UAV navigation, which uses image-based observations. Instead of deploying an end-to-end control pipeline, this work establishes a hierarchical control framework that leverages both the capabilities of neural networks for perception and the stability and safety guarantees of conventional controllers. The high-level controller in this hierarchical framework is a neural network policy with semantic image inputs, trained using RL algorithms with localisation-related rewards. The efficacy of the trained policy is demonstrated in real-world experiments for localisation-safe navigation, and, notably, it exhibits effectiveness without the need for retraining, thanks to the hierarchical control scheme and semantic inputs. Last, a tracking policy is introduced to enable a UAV to track a moving target. This study designs a reward space, enabling a vision-based UAV, which utilises depth images for perception, to follow a target within a safe and visible range. The objective is to maintain the mobile target at the centre of the drone camera’s image without being occluded by other objects and to avoid collisions with obstacles. It is observed that training such a policy from scratch may lead to local minima. To address this, a state-based teacher policy is trained to perform the tracking task, with environmental perception relying on direct access to state information, including position coordinates of obstacles, instead of depth images. An RL algorithm is then constructed to train the vision-based policy, incorporating behavioural guidance from the state-based teacher policy. This approach yields promising tracking performance

    Similar works