Search CORE

536 research outputs found

Automatic Document Image Binarization using Bayesian Optimization

Author: Badekas E
Bernsen John
Gatos Basilis
Nafchi Hossein Ziaei
Ntirogiannis Konstantinos
Pratikakis Ioannis
Pratikakis Ioannis
Pratikakis Ioannis
Pratikakis Ioannis
Su Bolan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/10/2017
Field of study

Document image binarization is often a challenging task due to various forms of degradation. Although there exist several binarization techniques in literature, the binarized image is typically sensitive to control parameter settings of the employed technique. This paper presents an automatic document image binarization algorithm to segment the text from heavily degraded document images. The proposed technique uses a two band-pass filtering approach for background noise removal, and Bayesian optimization for automatic hyperparameter selection for optimal results. The effectiveness of the proposed binarization technique is empirically demonstrated on the Document Image Binarization Competition (DIBCO) and the Handwritten Document Image Binarization Competition (H-DIBCO) datasets

arXiv.org e-Print Archive

Crossref

BiNet:Degraded-Manuscript Binarization in Diverse Document Textures and Layouts using Deep Encoder-Decoder Networks

Author: Dhali Maruf A.
Schomaker Lambert
Wit Jan Willem de
Publication venue
Publication date: 13/11/2019
Field of study

ARTS repository - University of Groningen

Handwritten Vedic Sanskrit Text Recognition Using Deep Learning and Convolutional Neural Networks

Author: Ashi Maheshwari et al.
Publication venue: Auricle Global Society of Education and Research
Publication date: 31/12/2023
Field of study

Recognizing Vedic Sanskrit text is essential for accessing classical Indo-Aryan language, predominantly utilized in the Vedas. Currently, there is limited awareness about the Vedas, making this field a highly demanding and challenging area in pattern recognition. To accelerate progress in optical character recognition (OCR), deep learning methods are indispensable. This article presents a novel approach to Vedic Sanskrit text recognition, incorporating deep convolutional architectures with their respective interpretations. We introduce three modified 4-fold CNN architectures and the AlexNet model. Our system comprises a handwritten dataset containing 140 distinct Vedic Sanskrit words, with approximately 500 images per word, totaling around 70,000 images. The dataset is partitioned for training and testing in an 80:20 ratio. Training is conducted using 20% of the samples, and the resulting model is applied to the deep convolutional network with varied sets of neurons in their hidden layers. Our proposed method demonstrates robust support for accurate Vedic Sanskrit word classification. The recognition rate achieved in our research is 97.42%, with an average recognition time of 0.3640 milliseconds, surpassing existing CNN-based approaches

International Journal on Recent and Innovation Trends in Computing and Communication

CCDWT-GAN: Generative Adversarial Networks Based on Color Channel Using Discrete Wavelet Transform for Document Image Binarization

Author: Chen Chih-Chia
Chen Wei-Han
Chiang Jen-Shiun
Chien Chun-Tse
Ju Rui-Yang
Lin Yu-Shian
Publication venue
Publication date: 27/05/2023
Field of study

To efficiently extract the textual information from color degraded document images is an important research topic. Long-term imperfect preservation of ancient documents has led to various types of degradation such as page staining, paper yellowing, and ink bleeding; these degradations badly impact the image processing for information extraction. In this paper, we present CCDWT-GAN, a generative adversarial network (GAN) that utilizes the discrete wavelet transform (DWT) on RGB (red, green, blue) channel splited images. The proposed method comprises three stages: image preprocessing, image enhancement, and image binarization. This work conducts comparative experiments in the image preprocessing stage to determine the optimal selection of DWT with normalization. Additionally, we perform an ablation study on the results of the image enhancement stage and the image binarization stage to validate their positive effect on the model performance. This work compares the performance of the proposed method with other state-of-the-art (SOTA) methods on DIBCO and H-DIBCO ((Handwritten) Document Image Binarization Competition) datasets. The experimental results demonstrate that CCDWT-GAN achieves a top two performance on multiple benchmark datasets, and outperforms other SOTA methods

arXiv.org e-Print Archive

GROUNDTRUTH GENERATION AND DOCUMENT IMAGE DEGRADATION

Author: Zi Gang
Publication venue
Publication date: 01/05/2005
Field of study

The problem of generating synthetic data for the training and evaluation of document analysis systems has been widely addressed in recent years. With the increased interest in processing multilingual sources, however, there is a tremendous need to be able to rapidly generate data in new languages and scripts, without the need to develop specialized systems. We have developed a system, which uses language support of the MS Windows operating system combined with custom print drivers to render tiff images simultaneously with windows Enhanced Metafile directives. The metafile information is parsed to generate zone, line, word, and character ground truth including location, font information and content in any language supported by Windows. The resulting images can be physically or synthetically degraded by our degradation modules, and used for training and evaluating Optical Character Recognition (OCR) systems. Our document image degradation methodology incorporates several often-encountered types of noise at the page and pixel levels. Examples of OCR evaluation and synthetically degraded document images are given to demonstrate the effectiveness

Crossref

Digital Repository at the University of Maryland