Biomedical image classification requires capturing of bio-informatics based
on specific feature distribution. In most of such applications, there are
mainly challenges due to limited availability of samples for diseased cases and
imbalanced nature of dataset. This article presents the novel framework of
multi-head self-attention for vision transformer (ViT) which makes capable of
capturing the specific image features for classification and analysis. The
proposed method uses the concept of residual connection for accumulating the
best attention output in each block of multi-head attention. The proposed
framework has been evaluated on two small datasets: (i) blood cell
classification dataset and (ii) brain tumor detection using brain MRI images.
The results show the significant improvement over traditional ViT and other
convolution based state-of-the-art classification models