34 research outputs found

    Offline printed Sindhi optical text recognition: survey

    Get PDF
    Optical Charter Recognition (OCR) applications are becoming more intensive than before and show great prospective for rapid data entry, but has limited success when applied to the Sindhi language. This paper summarize the general topic of optical character recognition and highlights the characteristics of Sindhi script. It also presents an historical review of the Sindhi text recognition systems. More this paper underlines the capabilities of different OCT=R systems, and then introduce a five stage model for off-line printed Sindhi text recognition system and classify research work according to this mode

    Handwritten Character Recognition of South Indian Scripts: A Review

    Full text link
    Handwritten character recognition is always a frontier area of research in the field of pattern recognition and image processing and there is a large demand for OCR on hand written documents. Even though, sufficient studies have performed in foreign scripts like Chinese, Japanese and Arabic characters, only a very few work can be traced for handwritten character recognition of Indian scripts especially for the South Indian scripts. This paper provides an overview of offline handwritten character recognition in South Indian Scripts, namely Malayalam, Tamil, Kannada and Telungu.Comment: Paper presented on the "National Conference on Indian Language Computing", Kochi, February 19-20, 2011. 6 pages, 5 figure

    A Technique for Character Segmentation in Middle zone of Handwritten Hindi words using Hybrid Approach

    Get PDF
    India is a country where people talk in multilingual and write in multi-script. Devanagari is one of the most popular scripts in India, which is used to write Hindi, Sanskrit, Sindhi, Marathi and Nepali Languages. This research work is performed on Hindi language. A large number of precious and essential documents are available in handwritten form, which needs to be converted into editable form. The existence of Optical Character Recognition (OCR) makes this task easier to convert handwritten text in editable form. Character segmentation is an important phase of OCR, which segment the characters from handwritten words. This enhances the accuracy of OCR system. In this paper a hybrid approach is used to segment the characters that contain single and multiple touching characters within a word. The proposed system is tested on a dataset of various handwritten words written by different writers. The dataset of proposed system contains more than 300 handwritten words in Hindi language. Accuracy of the proposed hybrid system is evaluated to 96% which is better than that of existing techniques

    Android Application to Help Reading English Words Using Mobile Vision and Text to Speech Facility at SD N Gayam 01 Sukoharjo

    Get PDF
    Abstract Reading is an activity carried out by humans to obtain information or pleasure especially for elementary students, reading is a very important activity to gain knowledge and build their life skill for their future especially at English communication. SDN Gayam 01 Sukoharjo is an elementary school located in Sukoharjo regency with the number of students reaching 450 people, in everyday learning some of the students still find it difficult to take part in learning such as reading English. This study aims to create an Android application to help SDN Gayam 01 Sukoharjo’s students and teachers who have difficulty reading English letters, using the mobile vision from Google Mobile and TextToSpeech facility to hear the pronunciation directly from American. The student and teacher expected to read English letter correctly, programming language-using Java. This application has passed black box testing and user acceptance test with satisfying test. Keywords: Android, Mobile Vision, TextToSpeech Abstrak Membaca merupakan kegiatan yang dilakukan oleh manusia untuk mendapat pengetahuan atau kepuasan tersendiri. Untuk murid Sekolah Dasar, membaca adalah kegiatan yang sangat penting untuk mendapat pengetahuan dan membangun keterampilan berkehidupan di masa yang akan datang khususnya dalam berkomunikasi bahasa inggris. SDN Gayam 01 Sukoharjo merupakan sekolah dasar yang berada di kabupaten Sukoharjo dengan jumlah murid mencapai 450 orang, Dalam pembelajara sehari – hari beberapa murid masih merasa kesulitan dalam mengikuti pembelajaran seperti membaca kata atau kalimat berbahasa inggris. Berdasarkan masalah diatas, penelitian ini bertujuan untuk membuat aplikasi Android untuk membantu murid dan guru SDN Gayam 01 Sukoharjo yang kesulitan dalam membaca kata bahasa inggris, menggunakan mobile vision dari Google Mobile dan fasilitas TextToSpeech untuk mendengar pengucapan lansung dari warga Amerika. Diharapkan guru dan murid dapat membaca dengan benar, bahasa pemrograman yang digunakan yaitu Java. Aplikasi ini telah melalui blackbox testing dan tes penerimaan pengguna dengan menghasilkan nilai yang memuaskan Kata Kunci: Android, Mobile Vision, TextToSpeec

    Issues & Challenges in Urdu OCR

    Get PDF
    Optical character recognition is a technique that is used to recognized printed and handwritten text into editable text format. There has been a lot of work done through this technology in identifying characters of different languages with variety of scripts. In which Latin scripts with isolated characters (non-cursive) like English are easy to recognize and significant advances have been made in the recognition; whereas, Arabic and its related cursive languages like Urdu have more complicated and intermingled scripts, are not much worked. This paper discusses a detail of various scripts of Urdu language also discuss issues and challenges regarding Urdu OCR. due to its cursive nature which include cursiveness, more characters dots, large set of characters for recognition, more base shape group characters, placement of dots, ambiguity between the characters and ligatures with very slight difference, context sensitive shapes, ligatures, noise, skew and fonts in Urdu OCR. This paper provides a better understanding toward all the possible engendering dilemmas related to Urdu character recognition

    UTRNet: High-Resolution Urdu Text Recognition In Printed Documents

    Full text link
    In this paper, we propose a novel approach to address the challenges of printed Urdu text recognition using high-resolution, multi-scale semantic feature extraction. Our proposed UTRNet architecture, a hybrid CNN-RNN model, demonstrates state-of-the-art performance on benchmark datasets. To address the limitations of previous works, which struggle to generalize to the intricacies of the Urdu script and the lack of sufficient annotated real-world data, we have introduced the UTRSet-Real, a large-scale annotated real-world dataset comprising over 11,000 lines and UTRSet-Synth, a synthetic dataset with 20,000 lines closely resembling real-world and made corrections to the ground truth of the existing IIITH dataset, making it a more reliable resource for future research. We also provide UrduDoc, a benchmark dataset for Urdu text line detection in scanned documents. Additionally, we have developed an online tool for end-to-end Urdu OCR from printed documents by integrating UTRNet with a text detection model. Our work not only addresses the current limitations of Urdu OCR but also paves the way for future research in this area and facilitates the continued advancement of Urdu OCR technology. The project page with source code, datasets, annotations, trained models, and online tool is available at abdur75648.github.io/UTRNet.Comment: Accepted at The 17th International Conference on Document Analysis and Recognition (ICDAR 2023
    corecore