3 research outputs found

    Research and Development of Feature Extraction from Myanmar Palm Leaf Manuscripts for the Myanmar Character Recognition System

    Get PDF
    This paper proposed Myanmar palm leaf manuscript handwriting OCR system. Each text area in the Myanmar palm-leaf manuscript is segmented. This segmented character text image is needed to be recognized to transform to Myanmar handwritten characters which express Myanmar’s precious historical and invaluable information. This paper involves two essential steps: preprocessing and feature extraction. The preprocessing is carried out to extract the attractive palm-leaf manuscript region from the Images automatically are taken by the camera and to support the enhanced images for subsequence processes of Myanmar character recognition from Myanmar palm leaves. The one-dimensional segmentation approach is used to crop leaf area in the image which is taken with high resolution. Line count analysis is also done to extract the region for using enough line count. After that, line segmentation is carried out using Object Frequency Histogram along the horizontal lines which can find the best optimal points between the lines. Similarly, the same technique but vertically is used to get each character or smallest group of characters. Totally 18 features are extracted to recognize the Myanmar palm-leaf manuscript characters. Although the experimental results are good enough but some difficulties are still needed to take account related to the connected components.

    AKSALont: Aplikasi transliterasi aksara Lontar Bali dengan model LSTM

    Get PDF
    This study aims to develop an automatic transliteration application for the Balinese palm leaf manuscripts into the Latin/Roman alphabet. The input for this system is the digital image of the original text from the ancient Balinese palm leaf manuscripts, not from the Balinese script, which is printed using a font on a computer. In this study, a segmentation-free transliteration machine using the LSTM model was implemented. In addition, the implementation of the AKSALont application is carried out for the interactions on a web-based platform using cross-platform interoperability. The experimental results show that the machine can transliterate Balinese characters on the Balinese palm-leaf manuscript images properly with a CER of 19.78 % using 10.475 test data. With a web-based online platform, AKSALont has been able to open wider access for the public to the web-based content with an online platform collection.Penelitian ini bertujuan untuk membangun sebuah aplikasi transliterasi aksara Lontar Bali menuju alfabet Latin/Romawi. Citra aksara Lontar Bali yang menjadi masukan bagi sistem ini adalah citra aksara Lontar Bali dari teks yang tertulis pada citra digital dari naskah kuno asli dari Lontar Bali, bukan dari aksara Bali yang tercetak dengan menggunakan font pada komputer. Mesin transliterasi menggunakan model LSTM sehingga proses transliterasi dapat dilakukan tanpa melalui proses segmentasi glyph. Selain itu, dilakukan perancangan dan implementasi interaksi aplikasi AKSALont pada platform berbasis web menggunakan metode interoperabilitas antar platform. Hasil eksperimen menunjukkan bahwa mesin transliterasi yang dibangun sudah menunjukkan kemampuan untuk melakukan transliterasi aksara Bali pada citra Lontar Bali dengan benar dan memiliki CER 19,78 % pada 10.475 data uji. Aplikasi AKSALont yang berbasis web dengan platform daring telah dapat membuka akses yang lebih meluas bagi masyarakat terhadap konten koleksi Lontar Bali

    An intelligent framework for pre-processing ancient Thai manuscripts on palm leaves

    Get PDF
    In Thailand’s early history, prior to the availability of paper and printing technologies, palm leaves were used to record information written by hand. These ancient documents contain invaluable knowledge. By digitising the manuscripts, the content can be preserved and made widely available to the interested community via electronic media. However, the content is difficult to access or retrieve. In order to extract relevant information from the document images efficiently, each step of the process requires reduction of irrelevant data such as noise or interference on the images. The pre-processing techniques serve the purpose of extracting regions of interest, reducing noise from the image and degrading the irrelevant background. The image can then be directly and efficiently processed for feature selection and extraction prior to the subsequent phase of character recognition. It is therefore the main objective of this study to develop an efficient and intelligent image preprocessing system that could be used to extract components from ancient manuscripts for information extraction and retrieval purposes. The main contributions of this thesis are the provision and enhancement of the region of interest by using an intelligent approach for the pre-processing of ancient Thai manuscripts on palm leaves and a detailed examination of the preprocessing techniques for palm leaf manuscripts. As noise reduction and binarisation are involved in the first step of pre-processing to eliminate noise and background from image documents, it is necessary for this step to provide a good quality output; otherwise, the accuracy of the subsequent stages will be affected. In this work, an intelligent approach to eliminate background was proposed and carried out by a selection of appropriate binarisation techniques using SVM. As there could be multiple binarisation techniques of choice, another approach was proposed to eliminate the background in this study in order to generate an optimal binarised image. The proposal is an ensemble architecture based on the majority vote scheme utilising local neighbouring information around a pixel of interest. To extract text from that binarised image, line segmentation was then applied based on the partial projection method as this method provides good results with slant texts and connected components. To improve the quality of the partial projection method, an Adaptive Partial Projection (APP) method was proposed. This technique adjusts the size of a character strip automatically by adapting the width of the strip to separate the connected component of consecutive lines through divide and conquer, and analysing the upper vowels and lower vowels of the text line. Finally, character segmentation was proposed using a hierarchical segmentation technique based on a contour-tracing algorithm. Touching components identified from the previous step were then separated by a trace of the background skeletons, and a combined method of segmentation. The key datasets used in this study are images provided by the Project for Palm Leaf Preservation, Northeastern Thailand Division, and benchmark datasets from the Document Image Binarisation Contest (DIBCO) series are used to compare the results of this work against other binarisation techniques. The experimental results have shown that the proposed methods in this study provide superior performance and will be used to support subsequent processing of the Thai ancient palm leaf documents. It is expected that the contributions from this study will also benefit research work on ancient manuscripts in other languages