2,857 research outputs found

    Feature Extraction Methods for Character Recognition

    Get PDF
    Not Include

    High-Quality Wavelets Features Extraction for Handwritten Arabic Numerals Recognition

    Get PDF
    Arabic handwritten digit recognition is the science of recognition and classification of handwritten Arabic digits. It has been a subject of research for many years with rich literature available on the subject.  Handwritten digits written by different people are not of the same size, thickness, style, position or orientation. Hence, many different challenges have to overcome for resolving the problem of handwritten digit recognition.  The variation in the digits is due to the writing styles of different people which can differ significantly.  Automatic handwritten digit recognition has wide application such as automatic processing of bank cheques, postal addresses, and tax forms. A typical handwritten digit recognition application consists of three main stages namely features extraction, features selection, and classification. One of the most important problems is feature extraction. In this paper, a novel feature extraction approach for off-line handwritten digit recognition is presented. Wavelets-based analysis of image data is carried out for feature extraction, and then classification is performed using various classifiers. To further reduce the size of training data-set, high entropy subbands are selected. To increase the recognition rate, individual subbands providing high classification accuracies are selected from the over-complete tree. The features extracted are also normalized to standardize the range of independent variables before providing them to the classifier. Classification is carried out using k-NN and SVMs. The results show that the quality of extracted features is high as almost equivalently high classification accuracies are acquired for both classifiers, i.e. k-NNs and SVMs

    Reconocimiento de notación matemática escrita a mano fuera de línea

    Get PDF
    El reconocimiento automático de expresiones matemáticas es uno de los problemas de reconocimiento de patrones, debido a que las matemáticas representan una fuente valiosa de información en muchos a ́reas de investigación. La escritura de expresiones matemáticas a mano es un medio de comunicación utilizado para la transmisión de información y conocimiento, con la cual se pueden generar de una manera sencilla escritos que contienen notación matemática. Este proceso puede volverse tedioso al ser escrito en lenguaje de composición tipográfica que pueda ser procesada por una computadora, tales como LATEX, MathML, entre otros. En los sistemas de reconocimiento de expresiones matem ́aticas existen dos m ́etodos diferentes a saber: fuera de l ́ınea y en l ́ınea. En esta tesis, se estudia el desempen ̃o de un sistema fuera de l ́ınea en donde se describen los pasos b ́asicos para lograr una mejor precisio ́n en el reconocimiento, las cuales esta ́n divididas en dos pasos principales: recono- cimiento de los s ́ımbolos de las ecuaciones matema ́ticas y el ana ́lisis de la estructura en que est ́an compuestos. Con el fin de convertir una expresi ́on matema ́tica escrita a mano en una expresio ́n equivalente en un sistema de procesador de texto, tal como TEX

    Accuracy Affecting Factors for Optical Handwritten Character Recognition

    Get PDF
    Optiline kirjatuvastus viitab tehnikale, mis konverteerib trükitud, kirjutatud või prinditud teksi masinkodeeritud tekstiks, võimaldades sellega paberdokumentide nagu passide, arvete, meditsiiniliste vormide või tšekkide automaatset töötlemist. Mustrituvastus, tehisintellekt ja arvuti nägemine on kõik teadusharud, mis võimaldavad optilist kirjatuvastust. Optilise kirjatuvastuse kasutus võimaldaks paljudel kasvavatel informatsiooni süsteemidel mugavat üleminekut paberformaadilt digitaalsele. Tänapäeval on optilisest kirjatuvastusest väljaskasvanud mitme sammuline protsess: segmenteerimine, andmete eeltöötlus, iseloomulike tunnuste tuletamine, klassifitseerimine, andmete järeltöötlus ja rakenduse spetsiifiline optimiseerimine. See lõputöö pakub välja tehnikaid, millega üleüldiselt tõsta optiliste kirjatuvastussüsteemide täpsust, näidates eeltöötluse, iseloomulike tunnuste tuletamise ja morfoloogilise töötluse mõju. Lisaks võrreldakse erinevate enimkasutatud klassifitseerijate tulemusi. Kasutades selles töös mainitud meetodeid saavutati täpsus üle 98% ja koguti märkimisväärselt suur andmebaas käsitsi kirjutatud jaapani keele hiragana tähestiku tähti.Optical character recognition (OCR) refers to a technique that converts images of typed, handwritten or printed text into machine-encoded text enabling automatic processing paper records such as passports, invoices, medical forms, receipts, etc. Pattern recognition, artificial intelligence and computer vision are all research fields that enable OCR. Using OCR on handwritten text could greatly benefit many of the emerging information systems by ensuring smooth transition from paper format to digital world. Nowadays, OCR has evolved into a multi-step process: segmentation, pre-processing, feature extraction, classification, post-processing and application-specific optimization. This thesis proposes techniques to improve the overall accuracy of the OCR systems by showing the affects of pre-processing, feature extraction and morphological processing. It also compares accuracies of different well-known and commonly used classifiers in the field. Using the proposed techniques an accuracy of over 98% was achieved. Also a dataset of handwritten Japanese Hiragana characters with a considerable variability was collected as a part of this thesis

    Advances in Character Recognition

    Get PDF
    This book presents advances in character recognition, and it consists of 12 chapters that cover wide range of topics on different aspects of character recognition. Hopefully, this book will serve as a reference source for academic research, for professionals working in the character recognition field and for all interested in the subject

    Unicode-driven Deep Learning Handwritten Telugu-to-English Character Recognition and Translation System

    Get PDF
    Telugu language is considered as fourth most used language in India especially in the regions of Andhra Pradesh, Telangana, Karnataka etc. In international recognized countries also, Telugu is widely growing spoken language. This language comprises of different dependent and independent vowels, consonants and digits. In this aspect, the enhancement of Telugu Handwritten Character Recognition (HCR) has not been propagated. HCR is a neural network technique of converting a documented image to edited text one which can be used for many other applications. This reduces time and effort without starting over from the beginning every time. In this work, a Unicode based Handwritten Character Recognition(U-HCR) is developed for translating the handwritten Telugu characters into English language. With the use of Centre of Gravity (CG) in our model we can easily divide a compound character into individual character with the help of Unicode values. For training this model, we have used both online and offline Telugu character datasets. To extract the features in the scanned image we used convolutional neural network along with Machine Learning classifiers like Random Forest and Support Vector Machine. Stochastic Gradient Descent (SGD), Root Mean Square Propagation (RMS-P) and Adaptative Moment Estimation (ADAM)optimizers are used in this work to enhance the performance of U-HCR and to reduce the loss function value. This loss value reduction can be possible with optimizers by using CNN. In both online and offline datasets, proposed model showed promising results by maintaining the accuracies with 90.28% for SGD, 96.97% for RMS-P and 93.57% for ADAM respectively

    UTRNet: High-Resolution Urdu Text Recognition In Printed Documents

    Full text link
    In this paper, we propose a novel approach to address the challenges of printed Urdu text recognition using high-resolution, multi-scale semantic feature extraction. Our proposed UTRNet architecture, a hybrid CNN-RNN model, demonstrates state-of-the-art performance on benchmark datasets. To address the limitations of previous works, which struggle to generalize to the intricacies of the Urdu script and the lack of sufficient annotated real-world data, we have introduced the UTRSet-Real, a large-scale annotated real-world dataset comprising over 11,000 lines and UTRSet-Synth, a synthetic dataset with 20,000 lines closely resembling real-world and made corrections to the ground truth of the existing IIITH dataset, making it a more reliable resource for future research. We also provide UrduDoc, a benchmark dataset for Urdu text line detection in scanned documents. Additionally, we have developed an online tool for end-to-end Urdu OCR from printed documents by integrating UTRNet with a text detection model. Our work not only addresses the current limitations of Urdu OCR but also paves the way for future research in this area and facilitates the continued advancement of Urdu OCR technology. The project page with source code, datasets, annotations, trained models, and online tool is available at abdur75648.github.io/UTRNet.Comment: Accepted at The 17th International Conference on Document Analysis and Recognition (ICDAR 2023

    An IoT System for Converting Handwritten Text to Editable Format via Gesture Recognition

    Get PDF
    Evaluation of traditional classroom has led to electronic classroom i.e. e-learning. Growth of traditional classroom doesn’t stop at e-learning or distance learning. Next step to electronic classroom is a smart classroom. Most popular features of electronic classroom is capturing video/photos of lecture content and extracting handwriting for note-taking. Numerous techniques have been implemented in order to extract handwriting from video/photo of the lecture but still the deficiency of few techniques can be resolved, and which can turn electronic classroom into smart classroom. In this thesis, we present a real-time IoT system to convert handwritten text into editable format by implementing hand gesture recognition (HGR) with Raspberry Pi and camera. Hand Gesture Recognition (HGR) is built using edge detection algorithm and HGR is used in this system to reduce computational complexity of previous systems i.e. removal of redundant images and lecture’s body from image, recollecting text from previous images to fill area from where lecture’s body has been removed. Raspberry Pi is used to retrieve, perceive HGR and to build a smart classroom based on IoT. Handwritten images are converted into editable format by using OpenCV and machine learning algorithms. In text conversion, recognition of uppercase and lowercase alphabets, numbers, special characters, mathematical symbols, equations, graphs and figures are included with recognition of word, lines, blocks, and paragraphs. With the help of Raspberry Pi and IoT, the editable format of lecture notes is given to students via desktop application which helps students to edit notes and images according to their necessity

    Character Recognition

    Get PDF
    Character recognition is one of the pattern recognition technologies that are most widely used in practical applications. This book presents recent advances that are relevant to character recognition, from technical topics such as image processing, feature extraction or classification, to new applications including human-computer interfaces. The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field
    corecore