106 research outputs found

    Recognition-based Approach of Numeral Extraction in Handwritten Chemistry Documents using Contextual Knowledge

    Get PDF
    International audienceThis paper presents a complete procedure that uses contextual and syntactic information to identify and recognize amount fields in the table regions of chemistry documents. The proposed method is composed of two main modules. Firstly, a structural analysis based on connected component (CC) dimensions and positions identifies some special symbols and clusters other CCs into three groups: fragment of characters, isolated characters or connected characters. Then, a specific processing is performed on each group of CCs. The fragment of characters are merged with the nearest character or string using geometric relationship based rules. The characters are sent to a recognition module to identify the numeral components. For the connected characters, the final decision on the string nature (numeric or non-numeric) is made based on a global score computed on the full string using the height regularity property and the recognition probabilities of its segmented fragments. Finally, a simple syntactic verification at table row level is conducted in order to correct eventual errors. The experimental tests are carried out on real-world chemistry documents provided by our industrial partner eNovalys. The obtained results show the effectiveness of the proposed system in extracting amount fields

    Automatic interpretation of clock drawings for computerised assessment of dementia

    Get PDF
    The clock drawing test (CDT) is a standard neurological test for detection of cognitive impairment. A computerised version of the test has potential to improve test accessibility and accuracy. CDT sketch interpretation is one of the first stages in the analysis of the computerised test. It produces a set of recognised digits and symbols together with their positions on the clock face. Subsequently, these are used in the test scoring. This is a challenging problem because the average CDT taker has a high likelihood of cognitive impairment, and writing is one of the first functional activities to be affected. Current interpretation systems perform less well on this kind of data due to its unintelligibility. In this thesis, a novel automatic interpretation system for CDT sketch is proposed and developed. The proposed interpretation system and all the related algorithms developed in this thesis are evaluated using a CDT data set collected for this study. This data consist of two sets, the first set consisting of 65 drawings made by healthy people, and the second consisting of 100 drawings reproduced from drawings of dementia patients. This thesis has four main contributions. The first is a conceptual model of the proposed CDT sketch interpretation system based on integrating prior knowledge of the expected CDT sketch structure and human reasoning into the drawing interpretation system. The second is a novel CDT sketch segmentation algorithm based on supervised machine learning and a new set of temporal and spatial features automatically extracted from the CDT data. The evaluation of the proposed method shows that it outperforms the current state-of-the-art method for CDT drawing segmentation. The third contribution is a new v handwritten digit recognition algorithm based on a set of static and dynamic features extracted from handwritten data. The algorithm combines two classifiers, fuzzy k-nearest neighbour’s classifier with a Convolutional Neural Network (CNN), which take advantage both of static and dynamic data representation. The proposed digit recognition algorithm is shown to outperform each classifier individually in terms of recognition accuracy. The final contribution of this study is the probabilistic Situational Bayesian Network (SBN), which is a new hierarchical probabilistic model for addressing the problem of fusing diverse data sources, such as CDT sketches created by healthy volunteers and dementia patients, in a probabilistic Bayesian network. The evaluation of the proposed SBN-based CDT sketch interpretation system on CDT data shows highly promising results, with 100% recognition accuracy for heathy CDT drawings and 97.15% for dementia data. To conclude, the proposed automatic CDT sketch interpretation system shows high accuracy in terms of recognising different sketch objects and thus paves the way for further research in dementia and clinical computer-assisted diagnosis of dementia

    Advances in Image Processing, Analysis and Recognition Technology

    Get PDF
    For many decades, researchers have been trying to make computers’ analysis of images as effective as the system of human vision is. For this purpose, many algorithms and systems have previously been created. The whole process covers various stages, including image processing, representation and recognition. The results of this work can be applied to many computer-assisted areas of everyday life. They improve particular activities and provide handy tools, which are sometimes only for entertainment, but quite often, they significantly increase our safety. In fact, the practical implementation of image processing algorithms is particularly wide. Moreover, the rapid growth of computational complexity and computer efficiency has allowed for the development of more sophisticated and effective algorithms and tools. Although significant progress has been made so far, many issues still remain, resulting in the need for the development of novel approaches

    A Comprehensive Literature Review on Convolutional Neural Networks

    Get PDF
    The fields of computer vision and image processing from their initial days have been dealing with the problems of visual recognition. Convolutional Neural Networks (CNNs) in machine learning are deep architectures built as feed-forward neural networks or perceptrons, which are inspired by the research done in the fields of visual analysis by the visual cortex of mammals like cats. This work gives a detailed analysis of CNNs for the computer vision tasks, natural language processing, fundamental sciences and engineering problems along with other miscellaneous tasks. The general CNN structure along with its mathematical intuition and working, a brief critical commentary on the advantages and disadvantages, which leads researchers to search for alternatives to CNN’s are also mentioned. The paper also serves as an appreciation of the brain-child of past researchers for the existence of such a fecund architecture for handling multidimensional data and approaches to improve their performance further

    An intelligent framework for pre-processing ancient Thai manuscripts on palm leaves

    Get PDF
    In Thailand’s early history, prior to the availability of paper and printing technologies, palm leaves were used to record information written by hand. These ancient documents contain invaluable knowledge. By digitising the manuscripts, the content can be preserved and made widely available to the interested community via electronic media. However, the content is difficult to access or retrieve. In order to extract relevant information from the document images efficiently, each step of the process requires reduction of irrelevant data such as noise or interference on the images. The pre-processing techniques serve the purpose of extracting regions of interest, reducing noise from the image and degrading the irrelevant background. The image can then be directly and efficiently processed for feature selection and extraction prior to the subsequent phase of character recognition. It is therefore the main objective of this study to develop an efficient and intelligent image preprocessing system that could be used to extract components from ancient manuscripts for information extraction and retrieval purposes. The main contributions of this thesis are the provision and enhancement of the region of interest by using an intelligent approach for the pre-processing of ancient Thai manuscripts on palm leaves and a detailed examination of the preprocessing techniques for palm leaf manuscripts. As noise reduction and binarisation are involved in the first step of pre-processing to eliminate noise and background from image documents, it is necessary for this step to provide a good quality output; otherwise, the accuracy of the subsequent stages will be affected. In this work, an intelligent approach to eliminate background was proposed and carried out by a selection of appropriate binarisation techniques using SVM. As there could be multiple binarisation techniques of choice, another approach was proposed to eliminate the background in this study in order to generate an optimal binarised image. The proposal is an ensemble architecture based on the majority vote scheme utilising local neighbouring information around a pixel of interest. To extract text from that binarised image, line segmentation was then applied based on the partial projection method as this method provides good results with slant texts and connected components. To improve the quality of the partial projection method, an Adaptive Partial Projection (APP) method was proposed. This technique adjusts the size of a character strip automatically by adapting the width of the strip to separate the connected component of consecutive lines through divide and conquer, and analysing the upper vowels and lower vowels of the text line. Finally, character segmentation was proposed using a hierarchical segmentation technique based on a contour-tracing algorithm. Touching components identified from the previous step were then separated by a trace of the background skeletons, and a combined method of segmentation. The key datasets used in this study are images provided by the Project for Palm Leaf Preservation, Northeastern Thailand Division, and benchmark datasets from the Document Image Binarisation Contest (DIBCO) series are used to compare the results of this work against other binarisation techniques. The experimental results have shown that the proposed methods in this study provide superior performance and will be used to support subsequent processing of the Thai ancient palm leaf documents. It is expected that the contributions from this study will also benefit research work on ancient manuscripts in other languages

    Forensic computing strategies for ethical academic writing.

    Get PDF
    Thesis (M.Com.)-University of KwaZulu-Natal, Westville, 2009.This study resulted in the creation of a conceptual framework for ethical academic writing that can be applied to cases of authorship identification. The framework is the culmination of research into various other forensic frameworks and aspects related to cyber forensics, in order to ensure maximum effectiveness of this newly developed methodology. The research shows how synergies between forensic linguistics and electronic forensics (computer forensics) create the conceptual space for a new, interdisciplinary, cyber forensic linguistics, along with forensic auditing procedures and tools for authorship identification. The research also shows that an individual’s unique word pattern usage can be used to determine document authorship, and that in other instances, authorship can be attributed with a significant degree of probability using the identified process. The importance of this fact cannot be understated, because accusations of plagiarism have to be based on facts that will withstand cross examination in a court of law. Therefore, forensic auditing procedures are required when attributing authorship in cases of suspected plagiarism, which is regarded as one of the most serious problems facing any academic institution. This study identifies and characterises various forms of plagiarism as well the responses that can be implemented to prevent and deter it. A number of online and offline tools for the detection and prevention of plagiarism are identified, over and above the more commonly used popular tools that, in the author’s view, are overrated because they are based on mechanistic identification of word similarities in source and target texts, rather than on proper grammatical and semantic principles. Linguistic analysis is a field not well understood and often underestimated. Yet it is a critical field of inquiry in determining specific cases of authorship. The research identifies the various methods of linguistic analysis that could be applied to help establish authorship identity, as well as how they can be applied within a forensic environment. Various software tools that could be used to identify and analyse source documents that were plagiarised are identified and briefly characterised. Concordance, function word analysis and other methods of corpus analysis are explained, along with some of their related software packages. Corpus analysis that in the past would have taken months to perform manually, could now only take a matter of hours using the correct programs, given the availability of computerised analysis tools. This research integrates the strengths of these tools within a structurally sound forensic auditing framework, the result of which is a conceptual framework that encompasses all the pertinent factors and ensures admissibility in a court of law by adhering to strict rules and features that are characteristic of the legal requirements for a forensic investigation

    Theory and Applications for Advanced Text Mining

    Get PDF
    Due to the growth of computer technologies and web technologies, we can easily collect and store large amounts of text data. We can believe that the data include useful knowledge. Text mining techniques have been studied aggressively in order to extract the knowledge from the data since late 1990s. Even if many important techniques have been developed, the text mining research field continues to expand for the needs arising from various application fields. This book is composed of 9 chapters introducing advanced text mining techniques. They are various techniques from relation extraction to under or less resourced language. I believe that this book will give new knowledge in the text mining field and help many readers open their new research fields

    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan languages

    Get PDF
    Proceedings of the Seventh International Conference Formal Approaches to South Slavic and Balkan Languages publishes 17 papers that were presented at the conference organised in Dubrovnik, Croatia, 4-6 Octobre 2010
    • …
    corecore