86 research outputs found

    A Study of Techniques and Challenges in Text Recognition Systems

    Get PDF
    The core system for Natural Language Processing (NLP) and digitalization is Text Recognition. These systems are critical in bridging the gaps in digitization produced by non-editable documents, as well as contributing to finance, health care, machine translation, digital libraries, and a variety of other fields. In addition, as a result of the pandemic, the amount of digital information in the education sector has increased, necessitating the deployment of text recognition systems to deal with it. Text Recognition systems worked on three different categories of text: (a) Machine Printed, (b) Offline Handwritten, and (c) Online Handwritten Texts. The major goal of this research is to examine the process of typewritten text recognition systems. The availability of historical documents and other traditional materials in many types of texts is another major challenge for convergence. Despite the fact that this research examines a variety of languages, the Gurmukhi language receives the most focus. This paper shows an analysis of all prior text recognition algorithms for the Gurmukhi language. In addition, work on degraded texts in various languages is evaluated based on accuracy and F-measure

    Segmentation of touching characters in upper zone in printed Gurmukhi script

    Full text link
    A new technique for segmenting touching characters in upper zone of printed Gurmukhi script has been presented in this paper. The technique is based on the structural properties of the Gurmukhi script characters. Concavity and convexity of the characters has been studied and using top profile projections, the touching characters in upper zone have been segmented. Recognition rate of 91 % has been achieved for segmenting the touching characters in upper zone

    A Technique for Character Segmentation in Middle zone of Handwritten Hindi words using Hybrid Approach

    Get PDF
    India is a country where people talk in multilingual and write in multi-script. Devanagari is one of the most popular scripts in India, which is used to write Hindi, Sanskrit, Sindhi, Marathi and Nepali Languages. This research work is performed on Hindi language. A large number of precious and essential documents are available in handwritten form, which needs to be converted into editable form. The existence of Optical Character Recognition (OCR) makes this task easier to convert handwritten text in editable form. Character segmentation is an important phase of OCR, which segment the characters from handwritten words. This enhances the accuracy of OCR system. In this paper a hybrid approach is used to segment the characters that contain single and multiple touching characters within a word. The proposed system is tested on a dataset of various handwritten words written by different writers. The dataset of proposed system contains more than 300 handwritten words in Hindi language. Accuracy of the proposed hybrid system is evaluated to 96% which is better than that of existing techniques

    Deep Learning Based Models for Offline Gurmukhi Handwritten Character and Numeral Recognition

    Get PDF
    Over the last few years, several researchers have worked on handwritten character recognition and have proposed various techniques to improve the performance of Indic and non-Indic scripts recognition. Here, a Deep Convolutional Neural Network has been proposed that learns deep features for offline Gurmukhi handwritten character and numeral recognition (HCNR). The proposed network works efficiently for training as well as testing and exhibits a good recognition performance. Two primary datasets comprising of offline handwritten Gurmukhi characters and Gurmukhi numerals have been employed in the present work. The testing accuracies achieved using the proposed network is 98.5% for characters and 98.6% for numerals

    Segmentation of Hanacaraka Characters using Double Projection Profile and Hough Transform

    Get PDF
    In doing segmentation of Hanacaraka character, Javanese ancient character, one of Indonesian’s ethnic ancient character in Java island, the difficulties that occur is the inconsistency of the space between lines, the size of the character and the thickness. Inconsistencies between row spacing and letter size are caused by the letters of the pair, the last vowel and consonant letters in one phoneme. While the thickness is inconsistent due to the writing style of the Hanacaraka itself. Image Preprocessing needs to be done to get input without skew. To improve skewed text documents, we used Hough transforms to predict the edges of the text area. After that, to segment the line and then continue with segmentation of each character, horizontal projection profile is used and then proceed with vertical. The result of this segmentation method is good for printed documents. Segmentation process of handwriting documents has difficulty because each row in the document is uneven and very tight between the rows. Those matters cause them overlap. When the line segmented wrongly, the entire character on the line will be not segmented as well. This problem can be eliminate using connectivity test. Before this, it need to segment the line with the overlap area. The character part of below or above the main character can be eliminate because it is not connected to the main character
    • …
    corecore