5 research outputs found

    An intelligent framework for pre-processing ancient Thai manuscripts on palm leaves

    Get PDF
    In Thailand鈥檚 early history, prior to the availability of paper and printing technologies, palm leaves were used to record information written by hand. These ancient documents contain invaluable knowledge. By digitising the manuscripts, the content can be preserved and made widely available to the interested community via electronic media. However, the content is difficult to access or retrieve. In order to extract relevant information from the document images efficiently, each step of the process requires reduction of irrelevant data such as noise or interference on the images. The pre-processing techniques serve the purpose of extracting regions of interest, reducing noise from the image and degrading the irrelevant background. The image can then be directly and efficiently processed for feature selection and extraction prior to the subsequent phase of character recognition. It is therefore the main objective of this study to develop an efficient and intelligent image preprocessing system that could be used to extract components from ancient manuscripts for information extraction and retrieval purposes. The main contributions of this thesis are the provision and enhancement of the region of interest by using an intelligent approach for the pre-processing of ancient Thai manuscripts on palm leaves and a detailed examination of the preprocessing techniques for palm leaf manuscripts. As noise reduction and binarisation are involved in the first step of pre-processing to eliminate noise and background from image documents, it is necessary for this step to provide a good quality output; otherwise, the accuracy of the subsequent stages will be affected. In this work, an intelligent approach to eliminate background was proposed and carried out by a selection of appropriate binarisation techniques using SVM. As there could be multiple binarisation techniques of choice, another approach was proposed to eliminate the background in this study in order to generate an optimal binarised image. The proposal is an ensemble architecture based on the majority vote scheme utilising local neighbouring information around a pixel of interest. To extract text from that binarised image, line segmentation was then applied based on the partial projection method as this method provides good results with slant texts and connected components. To improve the quality of the partial projection method, an Adaptive Partial Projection (APP) method was proposed. This technique adjusts the size of a character strip automatically by adapting the width of the strip to separate the connected component of consecutive lines through divide and conquer, and analysing the upper vowels and lower vowels of the text line. Finally, character segmentation was proposed using a hierarchical segmentation technique based on a contour-tracing algorithm. Touching components identified from the previous step were then separated by a trace of the background skeletons, and a combined method of segmentation. The key datasets used in this study are images provided by the Project for Palm Leaf Preservation, Northeastern Thailand Division, and benchmark datasets from the Document Image Binarisation Contest (DIBCO) series are used to compare the results of this work against other binarisation techniques. The experimental results have shown that the proposed methods in this study provide superior performance and will be used to support subsequent processing of the Thai ancient palm leaf documents. It is expected that the contributions from this study will also benefit research work on ancient manuscripts in other languages

    A New Design of Multiple Classifier System and its Application to Classification of Time Series Data

    Get PDF
    To solve the challenging pattern classification problem, machine learning researchers have extensively studied Multiple Classifier Systems (MCSs). The motivations for combining classifiers are found in the literature from the statistical, computational and representational perspectives. Although the results of classifier combination does not always outperform the best individual classifier in the ensemble, empirical studies have demonstrated its superiority for various applications. A number of viable methods to design MCSs have been developed including bagging, adaboost, rotation forest, and random subspace. They have been successfully applied to solve various tasks. Currently, most of the research is being conducted on the behavior patterns of the base classifiers in the ensemble. However, a discussion from the learning point of view may provide insights into the robust design of MCSs. In this thesis, Generalized Exhaustive Search and Aggregation (GESA) method is developed for this objective. Robust performance is achieved using GESA by dynamically adjusting the trade-off between fitting the training data adequately and preventing the overfitting problem. Besides its learning algorithm, GESA is also distinguished from traditional designs by its architecture and level of decision-making. GESA generates a collection of ensembles and dynamically selects the most appropriate ensemble for decision-making at the local level. Although GESA provides a good improvement over traditional approaches, it is not very data-adaptive. A data- adaptive design of MCSs demands that the system can adaptively select representations and classifiers to generate effective decisions for aggregation. Another weakness of GESA is its high computation cost which prevents it from being scaled to large ensembles. Generalized Adaptive Ensemble Generation and Aggregation (GAEGA) is an extension of GESA to overcome these two difficulties. GAEGA employs a greedy algorithm to adaptively select the most effective representations and classifiers while excluding the noise ones as much as possible. Consequently, GAEGA can generate fewer ensembles and significantly reduce the computation cost. Bootstrapped Adaptive Ensemble Generation and Aggregation (BAEGA) is another extension of GESA, which is similar with GAEGA in the ensemble generation and decision aggregation. BAEGA adopts a different data manipulation strategy to improve the diversity of the generated ensembles and utilize the information in the data more effectively. As a specific application, the classification of time series data is chosen for the research reported in this thesis. This type of data contains dynamic information and proves to be more complex than others. Multiple Input Representation-Adaptive Ensemble Generation and Aggregation (MIR-AEGA) is derived from GAEGA for the classification of time series data. MIR-AEGA involves some novel representation methods that proved to be effective for time series data. All the proposed methods including GESA, GAEGA, MIR-AEGA, and BAEGA are tested on simulated and benchmark data sets from popular data repositories. The experimental results confirm that the newly developed methods are effective and efficient

    Anales del XIII Congreso Argentino de Ciencias de la Computaci贸n (CACIC)

    Get PDF
    Contenido: Arquitecturas de computadoras Sistemas embebidos Arquitecturas orientadas a servicios (SOA) Redes de comunicaciones Redes heterog茅neas Redes de Avanzada Redes inal谩mbricas Redes m贸viles Redes activas Administraci贸n y monitoreo de redes y servicios Calidad de Servicio (QoS, SLAs) Seguridad inform谩tica y autenticaci贸n, privacidad Infraestructura para firma digital y certificados digitales An谩lisis y detecci贸n de vulnerabilidades Sistemas operativos Sistemas P2P Middleware Infraestructura para grid Servicios de integraci贸n (Web Services o .Net)Red de Universidades con Carreras en Inform谩tica (RedUNCI

    Anales del XIII Congreso Argentino de Ciencias de la Computaci贸n (CACIC)

    Get PDF
    Contenido: Arquitecturas de computadoras Sistemas embebidos Arquitecturas orientadas a servicios (SOA) Redes de comunicaciones Redes heterog茅neas Redes de Avanzada Redes inal谩mbricas Redes m贸viles Redes activas Administraci贸n y monitoreo de redes y servicios Calidad de Servicio (QoS, SLAs) Seguridad inform谩tica y autenticaci贸n, privacidad Infraestructura para firma digital y certificados digitales An谩lisis y detecci贸n de vulnerabilidades Sistemas operativos Sistemas P2P Middleware Infraestructura para grid Servicios de integraci贸n (Web Services o .Net)Red de Universidades con Carreras en Inform谩tica (RedUNCI
    corecore