319 research outputs found

    Persian Heritage Image Binarization Competition (PHIBC 2012)

    Full text link
    The first competition on the binarization of historical Persian documents and manuscripts (PHIBC 2012) has been organized in conjunction with the first Iranian conference on pattern recognition and image analysis (PRIA 2013). The main objective of PHIBC 2012 is to evaluate performance of the binarization methodologies, when applied on the Persian heritage images. This paper provides a report on the methodology and performance of the three submitted algorithms based on evaluation measures has been used.Comment: 4 pages, 2 figures, conferenc

    Historical Document Enhancement Using LUT Classification

    Get PDF
    The fast evolution of scanning and computing technologies in recent years has led to the creation of large collections of scanned historical documents. It is almost always the case that these scanned documents suffer from some form of degradation. Large degradations make documents hard to read and substantially deteriorate the performance of automated document processing systems. Enhancement of degraded document images is normally performed assuming global degradation models. When the degradation is large, global degradation models do not perform well. In contrast, we propose to learn local degradation models and use them in enhancing degraded document images. Using a semi-automated enhancement system, we have labeled a subset of the Frieder diaries collection (The diaries of Rabbi Dr. Avraham Abba Frieder. http://ir.iit.edu/collections/). This labeled subset was then used to train classifiers based on lookup tables in conjunction with the approximated nearest neighbor algorithm. The resulting algorithm is highly efficient and effective. Experimental evaluation results are provided using the Frieder diaries collection (The diaries of Rabbi Dr. Avraham Abba Frieder. http://ir.iit.edu/collections/). © Springer-Verlag 2009

    A Mask-Based Enhancement Method for Historical Documents

    Get PDF
    This paper proposes a novel method for document enhancement. The method is based on the combination of two state-of-the-art filters through the construction of a mask. The mask is applied to a TV (Total Variation) -regularized image where background noise has been reduced. The masked image is then filtered by NLmeans (Non-Local Means) which reduces the noise in the text areas located by the mask. The document images to be enhanced are real historical documents from several periods which include several defects in their background. These defects result from scanning, paper aging and bleed-through. We observe the improvement of this enhancement method through OCR accuracy

    Restoration of deteriorated text sections in ancient document images using atri-level semi-adaptive thresholding technique

    Get PDF
    The proposed research aims to restore deteriorated text sections that are affected by stain markings, ink seepages and document ageing in ancient document photographs, as these challenges confront document enhancement. A tri-level semi-adaptive thresholding technique is developed in this paper to overcome the issues. The primary focus, however, is on removing deteriorations that obscure text sections. The proposed algorithm includes three levels of degradation removal as well as pre- and post-enhancement processes. In level-wise degradation removal, a global thresholding approach is used, whereas, pseudo-colouring uses local thresholding procedures. Experiments on palm leaf and DIBCO document photos reveal a decent performance in removing ink/oil stains whilst retaining obscured text sections. In DIBCO and palm leaf datasets, our system also showed its efficacy in removing common deteriorations such as uneven illumination, show throughs, discolouration and writing marks. The proposed technique directly correlates to other thresholding-based benchmark techniques producing average F-measure and precision of 65.73 and 93% towards DIBCO datasets and 55.24 and 94% towards palm leaf datasets. Subjective analysis shows the robustness of proposed model towards the removal of stains degradations with a qualitative score of 3 towards 45% of samples indicating degradation removal with fairly readable text

    Development of an Automated Technique for Reconstructing Jawi Characters in Historical Documents

    Get PDF
    The old documents in Jawi script are still being used widely for references. The quality of the hard copies of those scripts will be deteriorating as time passes. Manual reconstruction may take long time if the documents are sufficiently thick. The accuracy of the document image recognition algorithms is much dependent on the level of noise on the document. Therefore, the development of the historical Jawi character reconstruction algorithm is a significant contributions to the success of the old Jawi manuscript maintenance and recognition systems. The Background Subtraction technique has proved to be the best algorithm when historical document images were evaluated. The proposed technique has improved the algorithm by incorporating an autonomous decision making, that makes the binarization technique a scale invariant algorithm. The prefiltering and post processing will further enhance the ability of the algorithm to remove noise from the documents. In the post binarization algorithm, separation techniques between characters with holes and without holes is introduced in order for different morphological operations to be applied to those characters. This method will enhance connection between broken characters but still preserving the originality of the document. A noise model has been developed to test the reliability of the proposed algorithm. The model was developed based on several predefined criteria. The algorithms have been implemented using Matlab software version 6.5. The reliability of the proposed algorithms have been tested over simulated and real data. Comparison has been made between the Background Subtraction technique and the proposed method by manual inspection and mathematical evaluation. The results of the algorithms were mathematically evaluated using the Relative Foreground Area Error. Results have shown that better performance has been obtained using the proposed method. The framework managed to create historical Jawi characters more presentable. The system is not only applicable to historical Jawi characters, it can be easily adapted to any other historical characters in different languages
    corecore