Efforts on the research and development of OCR systems for Low-Resource
Languages are relatively new. Low-resource languages have little training data
available for training Machine Translation systems or other systems. Even
though a vast amount of text has been digitized and made available on the
internet the text is still in PDF and Image format, which are not instantly
accessible. This paper discusses text recognition for two scripts: Bengali and
Nepali; there are about 300 and 40 million Bengali and Nepali speakers
respectively. In this study, using encoder-decoder transformers, a model was
developed, and its efficacy was assessed using a collection of optical text
images, both handwritten and printed. The results signify that the suggested
technique corresponds with current approaches and achieves high precision in
recognizing text in Bengali and Nepali. This study can pave the way for the
advanced and accessible study of linguistics in South East Asia.Comment: Accepted and Presented at ICAECC 2023, Bengaluru, Indi