127 research outputs found

    Application of Threshold Techniques for Readability Improvement of Jawi Historical Manuscript Images

    Full text link
    Historical documents such as old books and manuscripts have a high aesthetic value and highly appreciated. Unfortunately, there are some documents cannot be read due to quality problems like faded paper, ink expand, uneven colour tone, torn paper and other elements disruption such as the existence of small spots. The study aims to produce a copy of manuscript that shows clear wordings so they can easily be read and the copy can also be displayed for visitors. 16 samples of Jawi historical manuscript with different quality problems were obtained from The Royal Museum of Pahang, Malaysia. We applied three binarization techniques; Otsu's method represents global threshold technique; Sauvola and Niblack method which are categorized as local threshold techniques. We compared the binarized images with the original manuscript to be visually inspected by the museum's curator. The unclear features were marked and analyzed. Most of the examined images show that with optimal parameters and effective pre processing technique, local thresholding methods are work well compare with the other one. Niblack's and Sauvola's techniques seem to be the suitable approaches for these types of images. Most of binarized images with these two methods show improvement for readability and character recognition. For this research, even the differences of image result were hard to be distinguished by human capabilities, after comparing the time cost and overall achievement rate of recognized symbols, Niblack's method is performing better than Sauvola's. We could improve the post processing step by adding edge detection techniques and further enhanced by an innovative image refinement technique and a formulation of a class proper method.Comment: 10 pages, 6 figures, 2 tables, Advance Computing: An International Journal (ACIJ

    Image Enhancement Background for High Damage Malay Manuscripts using Adaptive Threshold Binarization

    Get PDF
    Jawi Manuscripts handwritten which are kept at Malaysia National Library (MNL), has aged over decades. Regardless of the intensive sustainable process conducted by MNL, these manuscripts are still not maintained in good quality, and neither can easily be read nor better view. Even thought, many states of the art methods have developed for image enhancement, none of them can solve extremely bad quality manuscripts. The quality of old Malay Manuscripts can be categorize into three types, namely: the background image is uneven, image effects and image effects expand patch. The aim of this paper is to discuss the methods used to value add the quality of the manuscript.  Our propose methods consist of several main methods, such as: Local Adaptive Equalization, Image Intensity Values, Automatic Threshold PP, and Adaptive Threshold Filtering. This paper is intend to achieve a better view image that geared to ease reading. Error Bit Phase achievement (TKB) has a smaller error value for proposed method (Adaptive Threshold Filtering Process / PAM) namely 0.0316 compared with Otsu’s Threshold Method / MNAO, Binary Threshold Value Method / MNAP, and Automatic Local Threshold Value Method / MNATA. The precision achievement (namely on ink bleed images) is using a proposed method more than 95% is compared with the state of the art methods MNAO, MNAP, MNATA and their performances are 75.82%, 90.68%, and 91.2% subsequently.  However, this paper’s achievement is using a proposed method / PAM, MNAO, MNAP, and MNATA for correspondingly the image of ink bleed case are 45.74%, 54.80%, 53.23% and 46.02%.  Conclusion, the proposed method produces a better character shape in comparison to other methods

    An intelligent framework for pre-processing ancient Thai manuscripts on palm leaves

    Get PDF
    In Thailand’s early history, prior to the availability of paper and printing technologies, palm leaves were used to record information written by hand. These ancient documents contain invaluable knowledge. By digitising the manuscripts, the content can be preserved and made widely available to the interested community via electronic media. However, the content is difficult to access or retrieve. In order to extract relevant information from the document images efficiently, each step of the process requires reduction of irrelevant data such as noise or interference on the images. The pre-processing techniques serve the purpose of extracting regions of interest, reducing noise from the image and degrading the irrelevant background. The image can then be directly and efficiently processed for feature selection and extraction prior to the subsequent phase of character recognition. It is therefore the main objective of this study to develop an efficient and intelligent image preprocessing system that could be used to extract components from ancient manuscripts for information extraction and retrieval purposes. The main contributions of this thesis are the provision and enhancement of the region of interest by using an intelligent approach for the pre-processing of ancient Thai manuscripts on palm leaves and a detailed examination of the preprocessing techniques for palm leaf manuscripts. As noise reduction and binarisation are involved in the first step of pre-processing to eliminate noise and background from image documents, it is necessary for this step to provide a good quality output; otherwise, the accuracy of the subsequent stages will be affected. In this work, an intelligent approach to eliminate background was proposed and carried out by a selection of appropriate binarisation techniques using SVM. As there could be multiple binarisation techniques of choice, another approach was proposed to eliminate the background in this study in order to generate an optimal binarised image. The proposal is an ensemble architecture based on the majority vote scheme utilising local neighbouring information around a pixel of interest. To extract text from that binarised image, line segmentation was then applied based on the partial projection method as this method provides good results with slant texts and connected components. To improve the quality of the partial projection method, an Adaptive Partial Projection (APP) method was proposed. This technique adjusts the size of a character strip automatically by adapting the width of the strip to separate the connected component of consecutive lines through divide and conquer, and analysing the upper vowels and lower vowels of the text line. Finally, character segmentation was proposed using a hierarchical segmentation technique based on a contour-tracing algorithm. Touching components identified from the previous step were then separated by a trace of the background skeletons, and a combined method of segmentation. The key datasets used in this study are images provided by the Project for Palm Leaf Preservation, Northeastern Thailand Division, and benchmark datasets from the Document Image Binarisation Contest (DIBCO) series are used to compare the results of this work against other binarisation techniques. The experimental results have shown that the proposed methods in this study provide superior performance and will be used to support subsequent processing of the Thai ancient palm leaf documents. It is expected that the contributions from this study will also benefit research work on ancient manuscripts in other languages

    Handwritten and printed text separation in historical documents

    Get PDF
    Historical documents present many challenges for Optical Character Recognition Systems (OCR), especially documents of poor quality containing handwritten annotations, stamps, signatures, and historical fonts. As most OCRs recognize either machine-printed or handwritten texts, printed and handwritten parts have to be separated before using the respective recognition system. This thesis addresses the problem of segmentation of handwritings and printings in historical Latin text documents. To alleviate the problem of lack of data containing handwritten and machine-printed components located on the same page or even overlapping each other as well as their pixel-wise annotations, the data synthesis method proposed in [12] was applied and new datasets were generated. The newly created images and their pixel-level labels were used to train Fully Convolutional Model (FCN) introduced in [5]. The newly trained model has shown better results in the separation of machine-printed and handwritten text in historical documents

    Improved wolf algorithm on document images detection using optimum mean technique

    Get PDF
    Detection text from handwriting in historical documents provides high-level features for the challenging problem of handwriting recognition. Such handwriting often contains noise, faint or incomplete strokes, strokes with gaps, and competing lines when embedded in a table or form, making it unsuitable for local line following algorithms or associated binarization schemes. In this paper, a proposed method based on the optimum threshold value and namely as the Optimum Mean method was presented. Besides, Wolf method unsuccessful in order to detect the thin text in the non-uniform input image. However, the proposed method was suggested to overcome the Wolf method problem by suggesting a maximum threshold value using optimum mean. Based on the calculation, the proposed method obtained a higher F-measure (74.53), PSNR (14.77) and lowest NRM (0.11) compared to the Wolf method. In conclusion, the proposed method successful and effective to solve the wolf problem by producing a high-quality output image
    • …
    corecore