CVOCR: Context Vision OCR

Abstract

Optical Character Recognition (OCR) technologies are crucial for automated information extraction across various domains. However, the intricate layouts and diverse text properties often found on different products can complicate accurate data retrieval and categorization. This paper introduces Context Vision OCR (CVOCR), a versatile framework designed to address the proposed challenges using advanced image processing and text analysis techniques. While CVOCR is applicable to any OCR-related application, this paper focuses on pharmaceutical items as a case study due to the stringent accuracy requirements and the complexity of medicine packaging. The CVOCR algorithm is developed based on the integration of the Fast Super-Resolution Convolutional Neural Network (FSRCNN) for enhanced image clarity, LayoutLMv2 for spatial layout understanding, Tesseract OCR for robust character recognition, and GPT-Neo for advanced contextual analysis. The strategic integration of these components form a cohesive system that significantly improves text detection and interpretation accuracy. We demonstrate the efficacy of the CVOCR system through testing on various pharmaceutical products, where it consistently outperforms Tesseract OCR

    Similar works