59 research outputs found

    An efficient convolutional neural network based classifier to predict Tamil writer

    Get PDF
    Identification of Tamil handwritten calligraphies at different levels such as character, word and paragraph is complicated when compared to other western language scripts. None of the existing methods provides efficient Tamil handwriting writer identification (THWI). Also offline Tamil handwritten identification at different levels still offers many motivating challenges to researchers. This paper employs a deep learning algorithm for handwriting image classification. Deep learning has its own dimensions to generate new features from a limited set of training dataset. Convolutional Neural Networks (CNNs) is one of deep, feed-forward artificial neural network is applied to THWI. The dataset collection and classification phase of CNN enables data access and automatic feature generation. Since the number of parameters is significantly reduced, training time to THWI is proportionally reduced. Understandably, the CNNs produced much higher identification rate compared with traditional ANN at different levels of handwriting

    Information Preserving Processing of Noisy Handwritten Document Images

    Get PDF
    Many pre-processing techniques that normalize artifacts and clean noise induce anomalies due to discretization of the document image. Important information that could be used at later stages may be lost. A proposed composite-model framework takes into account pre-printed information, user-added data, and digitization characteristics. Its benefits are demonstrated by experiments with statistically significant results. Separating pre-printed ruling lines from user-added handwriting shows how ruling lines impact people\u27s handwriting and how they can be exploited for identifying writers. Ruling line detection based on multi-line linear regression reduces the mean error of counting them from 0.10 to 0.03, 6.70 to 0.06, and 0.13 to 0.02, com- pared to an HMM-based approach on three standard test datasets, thereby reducing human correction time by 50%, 83%, and 72% on average. On 61 page images from 16 rule-form templates, the precision and recall of form cell recognition are increased by 2.7% and 3.7%, compared to a cross-matrix approach. Compensating for and exploiting ruling lines during feature extraction rather than pre-processing raises the writer identification accuracy from 61.2% to 67.7% on a 61-writer noisy Arabic dataset. Similarly, counteracting page-wise skew by subtracting it or transforming contours in a continuous coordinate system during feature extraction improves the writer identification accuracy. An implementation study of contour-hinge features reveals that utilizing the full probabilistic probability distribution function matrix improves the writer identification accuracy from 74.9% to 79.5%

    Multi-script handwritten character recognition:Using feature descriptors and machine learning

    Get PDF

    Handwritten Digit Recognition and Classification Using Machine Learning

    Get PDF
    In this paper, multiple learning techniques based on Optical character recognition (OCR) for the handwritten digit recognition are examined, and a new accuracy level for recognition of the MNIST dataset is reported. The proposed framework involves three primary parts, image pre-processing, feature extraction and classification. This study strives to improve the recognition accuracy by more than 99% in handwritten digit recognition. As will be seen, pre-processing and feature extraction play crucial roles in this experiment to reach the highest accuracy

    Online Devanagari Handwritten Character Recognition

    Get PDF
    This thesis proposes a neural network based framework to classify online Devanagari characters into one of 46 characters in the alphabet set. The uniqueness of this work is three-fold: (1) The feature extraction is just the Discrete Cosine Transform of the temporal sequence of the character points (utilizing the nature of online data input). We show that if it is used right, a simple feature set yielded by the DCT can be very reliable for accurate recognition of Devanagari handwriting, (2) The mode of character input is through a computer mouse - training the system with which will lead to jitter-robustness, and (3) We have built the online handwritten database of Devanagari characters from scratch, and there are some unique features in the way we have built up the database. Lastly, after comprehensive testing of the algorithm on 2760 characters, recognition rates of up to 97.2% are achieved

    Design of an Offline Handwriting Recognition System Tested on the Bangla and Korean Scripts

    Get PDF
    This dissertation presents a flexible and robust offline handwriting recognition system which is tested on the Bangla and Korean scripts. Offline handwriting recognition is one of the most challenging and yet to be solved problems in machine learning. While a few popular scripts (like Latin) have received a lot of attention, many other widely used scripts (like Bangla) have seen very little progress. Features such as connectedness and vowels structured as diacritics make it a challenging script to recognize. A simple and robust design for offline recognition is presented which not only works reliably, but also can be used for almost any alphabetic writing system. The framework has been rigorously tested for Bangla and demonstrated how it can be transformed to apply to other scripts through experiments on the Korean script whose two-dimensional arrangement of characters makes it a challenge to recognize. The base of this design is a character spotting network which detects the location of different script elements (such as characters, diacritics) from an unsegmented word image. A transcript is formed from the detected classes based on their corresponding location information. This is the first reported lexicon-free offline recognition system for Bangla and achieves a Character Recognition Accuracy (CRA) of 94.8%. This is also one of the most flexible architectures ever presented. Recognition of Korean was achieved with a 91.2% CRA. Also, a powerful technique of autonomous tagging was developed which can drastically reduce the effort of preparing a dataset for any script. The combination of the character spotting method and the autonomous tagging brings the entire offline recognition problem very close to a singular solution. Additionally, a database named the Boise State Bangla Handwriting Dataset was developed. This is one of the richest offline datasets currently available for Bangla and this has been made publicly accessible to accelerate the research progress. Many other tools were developed and experiments were conducted to more rigorously validate this framework by evaluating the method against external datasets (CMATERdb 1.1.1, Indic Word Dataset and REID2019: Early Indian Printed Documents). Offline handwriting recognition is an extremely promising technology and the outcome of this research moves the field significantly ahead

    Improving Search via Named Entity Recognition in Morphologically Rich Languages – A Case Study in Urdu

    Get PDF
    University of Minnesota Ph.D. dissertation. February 2018. Major: Computer Science. Advisors: Vipin Kumar, Blake Howald. 1 computer file (PDF); xi, 236 pages.Search is not a solved problem even in the world of Google and Bing's state of the art engines. Google and similar search engines are keyword based. Keyword-based searching suffers from the vocabulary mismatch problem -- the terms in document and user's information request don't overlap. For example, cars and automobiles. This phenomenon is called synonymy. Similarly, the user's term may be polysemous -- a user is inquiring about a river's bank, but documents about financial institutions are matched. Vocabulary mismatch exacerbated when the search occurs in Morphological Rich Language (MRL). Concept search techniques like dimensionality reduction do not improve search in Morphological Rich Languages. Names frequently occur news text and determine the "what," "where," "when," and "who" in the news text. Named Entity Recognition attempts to recognize names automatically in text, but these techniques are far from mature in MRL, especially in Arabic Script languages. Urdu is one the focus MRL of this dissertation among Arabic, Farsi, Hindi, and Russian, but it does not have the enabling technologies for NER and search. A corpus, stop word generation algorithm, a light stemmer, a baseline, and NER algorithm is created so the NER-aware search can be accomplished for Urdu. This dissertation demonstrates that NER-aware search on Arabic, Russian, Urdu, and English shows significant improvement over baseline. Furthermore, this dissertation highlights the challenges for researching in low-resource MRL languages

    A study of the effects of ageing on the characteristics of handwriting and signatures

    Get PDF
    The work presented in this thesis is focused on the understanding of factors that are unique to the elderly and their use of biometric systems. In particular, an investigation is carried out with a focus on the handwritten signature as the biometric modality of choice. This followed on from an in-depth analysis of various biometric modalities such as voice, fingerprint and face. This analysis aimed at investigating the inclusivity of and the policy guiding the use of biometrics by the elderly. Knowledge gained from extracted features of the handwritten signatures of the elderly shed more light on and exposed the uniqueness of some of these features in their ability to separate the elderly from the young. Consideration is also given to a comparative analysis of another handwriting task, that of copying text both in cursive and block capitals. It was discovered that there are features that are unique to each task. Insight into the human perceptual capability in inspecting signatures, in assessing complexity and in judging imitations was gained by analysing responses to practical scenarios that applied human perceptual judgement. Features extracted from a newly created database containing handwritten signatures donated by elderly subjects allowed the possibility of analysing the intra-class variations that exist within the elderly population
    • …
    corecore