Search CORE

8,009 research outputs found

Neural Computing for Online Arabic Handwriting Character Recognition using Hard Stroke Features Mining

Author: Rehman Amjad
Publication venue
Publication date: 15/01/2021
Field of study

Online Arabic cursive character recognition is still a big challenge due to the existing complexities including Arabic cursive script styles, writing speed, writer mood and so forth. Due to these unavoidable constraints, the accuracy of online Arabic character's recognition is still low and retain space for improvement. In this research, an enhanced method of detecting the desired critical points from vertical and horizontal direction-length of handwriting stroke features of online Arabic script recognition is proposed. Each extracted stroke feature divides every isolated character into some meaningful pattern known as tokens. A minimum feature set is extracted from these tokens for classification of characters using a multilayer perceptron with a back-propagation learning algorithm and modified sigmoid function-based activation function. In this work, two milestones are achieved; firstly, attain a fixed number of tokens, secondly, minimize the number of the most repetitive tokens. For experiments, handwritten Arabic characters are selected from the OHASD benchmark dataset to test and evaluate the proposed method. The proposed method achieves an average accuracy of 98.6% comparable in state of art character recognition techniques.Comment: 16 page

arXiv.org e-Print Archive

A Framework for On-Line Devanagari Handwritten Character Recognition

Author: Kopparapu Sunil Kumar
L Lajish V.
Publication venue
Publication date: 25/10/2014
Field of study

The main challenge in on-line handwritten character recognition in Indian lan- guage is the large size of the character set, larger similarity between different characters in the script and the huge variation in writing style. In this paper we propose a framework for on-line handwitten script recognition taking cues from speech signal processing literature. The framework is based on identify- ing strokes, which in turn lead to recognition of handwritten on-line characters rather that the conventional character identification. Though the framework is described for Devanagari script, the framework is general and can be applied to any language. The proposed platform consists of pre-processing, feature extraction, recog- nition and post processing like the conventional character recognition but ap- plied to strokes. The on-line Devanagari character recognition reduces to one of recognizing one of 69 primitives and recognition of a character is performed by recognizing a sequence of such primitives. We further show the impact of noise removal on on-line raw data which is usually noisy. The use of Fuzzy Direc- tional Features to enhance the accuracy of stroke recognition is also described. The recognition results are compared with commonly used directional features in literature using several classifiers.Comment: 29 page

arXiv.org e-Print Archive

A Medial Axis Based Thinning Strategy for Character Images

Author: Bag Soumen
Harit Gaurav
Publication venue
Publication date: 03/03/2011
Field of study

Thinning of character images is a big challenge. Removal of strokes or deformities in thinning is a difficult problem. In this paper, we have proposed a medial axis based thinning strategy used for performing skeletonization of printed and handwritten character images. In this method, we have used shape characteristics of text to get skeleton of nearly same as the true character shape. This approach helps to preserve the local features and true shape of the character images. The proposed algorithm produces one pixel width thin skeleton. As a by-product of our thinning approach, the skeleton also gets segmented into strokes in vector form. Hence further stroke segmentation is not required. Experiment is done on printed English and Bengali characters and we obtain less spurious branches comparing with other thinning methods without any post processing.Comment: 6 pages, 5 figures. In proceedings of the second National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics (NCVPRIPG), pp. 67-72, Jaipur, India, 201

arXiv.org e-Print Archive

Online Handwritten Devanagari Stroke Recognition Using Extended Directional Features

Author: Kopparapu Sunil Kumar
VL Lajish
Publication venue
Publication date: 11/01/2015
Field of study

This paper describes a new feature set, called the extended directional features (EDF) for use in the recognition of online handwritten strokes. We use EDF specifically to recognize strokes that form a basis for producing Devanagari script, which is the most widely used Indian language script. It should be noted that stroke recognition in handwritten script is equivalent to phoneme recognition in speech signals and is generally very poor and of the order of 20% for singing voice. Experiments are conducted for the automatic recognition of isolated handwritten strokes. Initially we describe the proposed feature set, namely EDF and then show how this feature can be effectively utilized for writer independent script recognition through stroke recognition. Experimental results show that the extended directional feature set performs well with about 65+% stroke level recognition accuracy for writer independent data set.Comment: 8th International Conference on Signal Processing and Communication Systems 15 - 17 December 2014, Gold Coast, Australi

arXiv.org e-Print Archive

Indic Handwritten Script Identification using Offline-Online Multimodal Deep Network

Author: Bhunia Ankan Kumar
Bhunia Ayan Kumar
Mukherjee Subham
Pal Umapada
Roy Partha Pratim
Sain Aneeshan
Publication venue
Publication date: 15/10/2019
Field of study

In this paper, we propose a novel approach of word-level Indic script identification using only character-level data in training stage. The advantages of using character level data for training have been outlined in section I. Our method uses a multimodal deep network which takes both offline and online modality of the data as input in order to explore the information from both the modalities jointly for script identification task. We take handwritten data in either modality as input and the opposite modality is generated through intermodality conversion. Thereafter, we feed this offline-online modality pair to our network. Hence, along with the advantage of utilizing information from both the modalities, it can work as a single framework for both offline and online script identification simultaneously which alleviates the need for designing two separate script identification modules for individual modality. One more major contribution is that we propose a novel conditional multimodal fusion scheme to combine the information from offline and online modality which takes into account the real origin of the data being fed to our network and thus it combines adaptively. An exhaustive experiment has been done on a data set consisting of English and six Indic scripts. Our proposed framework clearly outperforms different frameworks based on traditional classifiers along with handcrafted features and deep learning based methods with a clear margin. Extensive experiments show that using only character level training data can achieve state-of-art performance similar to that obtained with traditional training using word level data in our framework.Comment: Accepted in Information Fusion, Elsevie

arXiv.org e-Print Archive

Recurrent neural networks based Indic word-wise script identification using character-wise training

Author: Gill Aman
Tripathi Rohun
Tripati Riccha
Publication venue
Publication date: 27/12/2018
Field of study

This paper presents a novel methodology of Indic handwritten script recognition using Recurrent Neural Networks and addresses the problem of script recognition in poor data scenarios, such as when only character level online data is available. It is based on the hypothesis that curves of online character data comprise sufficient information for prediction at the word level. Online character data is used to train RNNs using BLSTM architecture which are then used to make predictions of online word level data. These prediction results on the test set are at par with prediction results of models trained with online word data, while the training of the character level model is much less data intensive and takes much less time. Performance for binary-script models and then 5 Indic script models are reported, along with comparison with HMM models.The system is extended for offline data prediction. Raw offline data lacks the temporal information available in online data and required for prediction using models trained with online data. To overcome this, stroke recovery is implemented and the strokes are utilized for predicting using the online character level models. The performance on character and word level offline data is reported.Comment: Version accepted at ICPRS 201

arXiv.org e-Print Archive

Handwritten Chinese Font Generation with Collaborative Stroke Refinement

Author: Chang Jie
Chen Siheng
Han Mei
Tian Qi
Wang Yanfeng
Wen Chuan
Zhang Ya
Publication venue
Publication date: 06/05/2019
Field of study

Automatic character generation is an appealing solution for new typeface design, especially for Chinese typefaces including over 3700 most commonly-used characters. This task has two main pain points: (i) handwritten characters are usually associated with thin strokes of few information and complex structure which are error prone during deformation; (ii) thousands of characters with various shapes are needed to synthesize based on a few manually designed characters. To solve those issues, we propose a novel convolutional-neural-network-based model with three main techniques: collaborative stroke refinement, using collaborative training strategy to recover the missing or broken strokes; online zoom-augmentation, taking the advantage of the content-reuse phenomenon to reduce the size of training set; and adaptive pre-deformation, standardizing and aligning the characters. The proposed model needs only 750 paired training samples; no pre-trained network, extra dataset resource or labels is needed. Experimental results show that the proposed method significantly outperforms the state-of-the-art methods under the practical restriction on handwritten font synthesis.Comment: 8 pages(exclude reference

arXiv.org e-Print Archive

DeepWriting: Making Digital Ink Editable via Deep Generative Modeling

Author: Aksan Emre
Hilliges Otmar
Pece Fabrizio
Publication venue
Publication date: 25/01/2018
Field of study

Digital ink promises to combine the flexibility and aesthetics of handwriting and the ability to process, search and edit digital text. Character recognition converts handwritten text into a digital representation, albeit at the cost of losing personalized appearance due to the technical difficulties of separating the interwoven components of content and style. In this paper, we propose a novel generative neural network architecture that is capable of disentangling style from content and thus making digital ink editable. Our model can synthesize arbitrary text, while giving users control over the visual appearance (style). For example, allowing for style transfer without changing the content, editing of digital ink at the word level and other application scenarios such as spell-checking and correction of handwritten text. We furthermore contribute a new dataset of handwritten text with fine-grained annotations at the character level and report results from an initial user evaluation

arXiv.org e-Print Archive

Handwritten Character Recognition In Malayalam Scripts- A Review

Author: Chacko Anitha Mary M. O.
Dhanya P. M
Publication venue
Publication date: 10/02/2014
Field of study

Handwritten character recognition is one of the most challenging and ongoing areas of research in the field of pattern recognition. HCR research is matured for foreign languages like Chinese and Japanese but the problem is much more complex for Indian languages. The problem becomes even more complicated for South Indian languages due to its large character set and the presence of vowels modifiers and compound characters. This paper provides an overview of important contributions and advances in offline as well as online handwritten character recognition of Malayalam scripts.Comment: 11 pages,4 figures,2 table

arXiv.org e-Print Archive

Stroke extraction for offline handwritten mathematical expression recognition

Author: Chan Chungkwong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/01/2020
Field of study

Offline handwritten mathematical expression recognition is often considered much harder than its online counterpart due to the absence of temporal information. In order to take advantage of the more mature methods for online recognition and save resources, an oversegmentation approach is proposed to recover strokes from textual bitmap images automatically. The proposed algorithm first breaks down the skeleton of a binarized image into junctions and segments, then segments are merged to form strokes, finally stroke order is normalized by using recursive projection and topological sort. Good offline accuracy was obtained in combination with ordinary online recognizers, which are not specially designed for extracted strokes. Given a ready-made state-of-the-art online handwritten mathematical expression recognizer, the proposed procedure correctly recognized 58.22%, 65.65%, and 65.22% of the offline formulas rendered from the datasets of the Competitions on Recognition of Online Handwritten Mathematical Expressions(CROHME) in 2014, 2016, and 2019 respectively. Furthermore, given a trainable online recognition system, retraining it with extracted strokes resulted in an offline recognizer with the same level of accuracy. On the other hand, the speed of the entire pipeline was fast enough to facilitate on-device recognition on mobile phones with limited resources. To conclude, stroke extraction provides an attractive way to build optical character recognition software.Comment: 22 pages, 7 figure

arXiv.org e-Print Archive