We present a new learning-based method for identifying safe and navigable
regions in off-road terrains and unstructured environments from RGB images. Our
approach consists of classifying groups of terrain classes based on their
navigability levels using coarse-grained semantic segmentation. We propose a
bottleneck transformer-based deep neural network architecture that uses a novel
group-wise attention mechanism to distinguish between navigability levels of
different terrains.Our group-wise attention heads enable the network to
explicitly focus on the different groups and improve the accuracy. In addition,
we propose a dynamic weighted cross entropy loss function to handle the
long-tailed nature of the dataset. We show through extensive evaluations on the
RUGD and RELLIS-3D datasets that our learning algorithm improves the accuracy
of visual perception in off-road terrains for navigation. We compare our
approach with prior work on these datasets and achieve an improvement over the
state-of-the-art mIoU by 6.74-39.1% on RUGD and 3.82-10.64% on RELLIS-3D