536 research outputs found
Automatic Document Image Binarization using Bayesian Optimization
Document image binarization is often a challenging task due to various forms
of degradation. Although there exist several binarization techniques in
literature, the binarized image is typically sensitive to control parameter
settings of the employed technique. This paper presents an automatic document
image binarization algorithm to segment the text from heavily degraded document
images. The proposed technique uses a two band-pass filtering approach for
background noise removal, and Bayesian optimization for automatic
hyperparameter selection for optimal results. The effectiveness of the proposed
binarization technique is empirically demonstrated on the Document Image
Binarization Competition (DIBCO) and the Handwritten Document Image
Binarization Competition (H-DIBCO) datasets
Handwritten Vedic Sanskrit Text Recognition Using Deep Learning and Convolutional Neural Networks
Recognizing Vedic Sanskrit text is essential for accessing classical Indo-Aryan language, predominantly utilized in the Vedas. Currently, there is limited awareness about the Vedas, making this field a highly demanding and challenging area in pattern recognition. To accelerate progress in optical character recognition (OCR), deep learning methods are indispensable. This article presents a novel approach to Vedic Sanskrit text recognition, incorporating deep convolutional architectures with their respective interpretations. We introduce three modified 4-fold CNN architectures and the AlexNet model. Our system comprises a handwritten dataset containing 140 distinct Vedic Sanskrit words, with approximately 500 images per word, totaling around 70,000 images. The dataset is partitioned for training and testing in an 80:20 ratio. Training is conducted using 20% of the samples, and the resulting model is applied to the deep convolutional network with varied sets of neurons in their hidden layers. Our proposed method demonstrates robust support for accurate Vedic Sanskrit word classification. The recognition rate achieved in our research is 97.42%, with an average recognition time of 0.3640 milliseconds, surpassing existing CNN-based approaches
CCDWT-GAN: Generative Adversarial Networks Based on Color Channel Using Discrete Wavelet Transform for Document Image Binarization
To efficiently extract the textual information from color degraded document
images is an important research topic. Long-term imperfect preservation of
ancient documents has led to various types of degradation such as page
staining, paper yellowing, and ink bleeding; these degradations badly impact
the image processing for information extraction. In this paper, we present
CCDWT-GAN, a generative adversarial network (GAN) that utilizes the discrete
wavelet transform (DWT) on RGB (red, green, blue) channel splited images. The
proposed method comprises three stages: image preprocessing, image enhancement,
and image binarization. This work conducts comparative experiments in the image
preprocessing stage to determine the optimal selection of DWT with
normalization. Additionally, we perform an ablation study on the results of the
image enhancement stage and the image binarization stage to validate their
positive effect on the model performance. This work compares the performance of
the proposed method with other state-of-the-art (SOTA) methods on DIBCO and
H-DIBCO ((Handwritten) Document Image Binarization Competition) datasets. The
experimental results demonstrate that CCDWT-GAN achieves a top two performance
on multiple benchmark datasets, and outperforms other SOTA methods
GROUNDTRUTH GENERATION AND DOCUMENT IMAGE DEGRADATION
The problem of generating synthetic data for the training and evaluation of document analysis systems has been widely addressed in recent years. With the increased interest in processing multilingual sources, however, there is a tremendous need to be able to rapidly generate data in new languages and scripts, without the need to develop specialized systems. We have developed a system, which uses language support of the MS Windows operating system combined with custom print drivers to render tiff images simultaneously with windows Enhanced Metafile directives. The metafile information is parsed to generate zone, line, word, and character ground truth including location, font information and content in any language supported by Windows. The resulting images can be physically or synthetically degraded by our degradation modules, and used for training and evaluating Optical Character Recognition (OCR) systems. Our document image degradation methodology incorporates several often-encountered types of noise at the page and pixel levels. Examples of OCR evaluation and synthetically degraded document images are given to demonstrate the effectiveness
- …