536 research outputs found

    Automatic Document Image Binarization using Bayesian Optimization

    Full text link
    Document image binarization is often a challenging task due to various forms of degradation. Although there exist several binarization techniques in literature, the binarized image is typically sensitive to control parameter settings of the employed technique. This paper presents an automatic document image binarization algorithm to segment the text from heavily degraded document images. The proposed technique uses a two band-pass filtering approach for background noise removal, and Bayesian optimization for automatic hyperparameter selection for optimal results. The effectiveness of the proposed binarization technique is empirically demonstrated on the Document Image Binarization Competition (DIBCO) and the Handwritten Document Image Binarization Competition (H-DIBCO) datasets

    Handwritten Vedic Sanskrit Text Recognition Using Deep Learning and Convolutional Neural Networks

    Get PDF
    Recognizing Vedic Sanskrit text is essential for accessing classical Indo-Aryan language, predominantly utilized in the Vedas. Currently, there is limited awareness about the Vedas, making this field a highly demanding and challenging area in pattern recognition. To accelerate progress in optical character recognition (OCR), deep learning methods are indispensable. This article presents a novel approach to Vedic Sanskrit text recognition, incorporating deep convolutional architectures with their respective interpretations. We introduce three modified 4-fold CNN architectures and the AlexNet model. Our system comprises a handwritten dataset containing 140 distinct Vedic Sanskrit words, with approximately 500 images per word, totaling around 70,000 images. The dataset is partitioned for training and testing in an 80:20 ratio. Training is conducted using 20% of the samples, and the resulting model is applied to the deep convolutional network with varied sets of neurons in their hidden layers. Our proposed method demonstrates robust support for accurate Vedic Sanskrit word classification. The recognition rate achieved in our research is 97.42%, with an average recognition time of 0.3640 milliseconds, surpassing existing CNN-based approaches

    CCDWT-GAN: Generative Adversarial Networks Based on Color Channel Using Discrete Wavelet Transform for Document Image Binarization

    Full text link
    To efficiently extract the textual information from color degraded document images is an important research topic. Long-term imperfect preservation of ancient documents has led to various types of degradation such as page staining, paper yellowing, and ink bleeding; these degradations badly impact the image processing for information extraction. In this paper, we present CCDWT-GAN, a generative adversarial network (GAN) that utilizes the discrete wavelet transform (DWT) on RGB (red, green, blue) channel splited images. The proposed method comprises three stages: image preprocessing, image enhancement, and image binarization. This work conducts comparative experiments in the image preprocessing stage to determine the optimal selection of DWT with normalization. Additionally, we perform an ablation study on the results of the image enhancement stage and the image binarization stage to validate their positive effect on the model performance. This work compares the performance of the proposed method with other state-of-the-art (SOTA) methods on DIBCO and H-DIBCO ((Handwritten) Document Image Binarization Competition) datasets. The experimental results demonstrate that CCDWT-GAN achieves a top two performance on multiple benchmark datasets, and outperforms other SOTA methods

    GROUNDTRUTH GENERATION AND DOCUMENT IMAGE DEGRADATION

    Get PDF
    The problem of generating synthetic data for the training and evaluation of document analysis systems has been widely addressed in recent years. With the increased interest in processing multilingual sources, however, there is a tremendous need to be able to rapidly generate data in new languages and scripts, without the need to develop specialized systems. We have developed a system, which uses language support of the MS Windows operating system combined with custom print drivers to render tiff images simultaneously with windows Enhanced Metafile directives. The metafile information is parsed to generate zone, line, word, and character ground truth including location, font information and content in any language supported by Windows. The resulting images can be physically or synthetically degraded by our degradation modules, and used for training and evaluating Optical Character Recognition (OCR) systems. Our document image degradation methodology incorporates several often-encountered types of noise at the page and pixel levels. Examples of OCR evaluation and synthetically degraded document images are given to demonstrate the effectiveness
    • …
    corecore