Search CORE

30 research outputs found

Advances in Character Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

This book presents advances in character recognition, and it consists of 12 chapters that cover wide range of topics on different aspects of character recognition. Hopefully, this book will serve as a reference source for academic research, for professionals working in the character recognition field and for all interested in the subject

Directory of Open Access Books (DOAB)

Segmentation of Nastaliq script for OCR

Author: Haque Mahmood K.Haque Pathan
Haque Shamsul
Sattar Sohail A.
Publication venue
Publication date: 24/06/2009
Field of study

In this paper we have presented a novel segmentation technique for the implementation of an OCR (Optical Character Recognition) for printed Nastalique text, a calligraphic style of Urdu which uses the Arabic script for its writing.OCR for many of the world major languages have been developed and are being used but at present an OCR for Nastalique is not available and the published research on Nastalique OCR, Urdu OCR or even on any area of Urdu computing is almost non-existent, the reason being the challenges that the Nastalique style poses for its optical recognition. We used Matlab 7 for our experimentation the results are reported in this paper which are very encouraging

UUM Repository

Recent Improvements in the BBN OCR System

Author: Issam Bazzi
John Makhoul
Kornai András
Prem Natarajan
Richard Schwartz
Zhidong Lu
Publication venue
Publication date: 01/01/1999
Field of study

SZTAKI Publication Repository

Comparison of Template Matching Algorithm and Feature Extraction Algorithm in Sundanese Script Transliteration Application using Optical Character Recognition

Author: Atmadja Aldy Rialdy
Gerhana Yana Aditia
Padilah Muhamad Farid
Publication venue: 'Sunan Gunung Djati State Islamic University of Bandung'
Publication date: 16/07/2020
Field of study

The phenomenon that occurs in the area of West Java Province is that the people do not preserve their culture, especially regional literature, namely Sundanese script, in this digital era there is research on Sundanese script combined with applications using Feature Extraction algorithm, but there is no comparison with other algorithms and cannot recognize Sundanese numbers. Therefore, to develop the research a Sundanese script application was made with the implementation of OCR (Optical Character Recognition) using the Template Matching algorithm and the Feature Extraction algorithm that was modified with the pre-processing stages including using luminosity and thresholding algorithms, from the two algorithms compared to the accuracy and time values the process of recognizing digital writing and handwriting, the results of testing digital writing algorithm Matching algorithm has a value of 87% word recognition accuracy with 236 ms processing time and 97.6% character recognition accuracy with 227 ms processing time, Feature Extraction has 98% word recognition accuracy with 73.6 ms processing time and 100% character recognition accuracy with 66 ms processing time, for handwriting recognition in feature extraction character recognition has 83% accuracy and 75% word recognition , while template matching in character recognition has an accuracy of 70% and word recognition has an accuracy of 66%

Jurnal Online Informatika

A Lexicon of Connected Components for Arabic Optical Text Recognition

Author: Elarian Yousef
Idris Fayez
Publication venue
Publication date: 12/01/2011
Field of study

Arabic is a cursive script that lacks the ease of character segmentation. Hence, we suggest a unit that is discrete in nature, viz. the connected component, for Arabic text recognition. A lexicon listing valid Arabic connected components is necessary to any system that is to use such unit. Here, we produce and analyze a comprehensive lexicon of connected components. A lexicon can be extracted from corpora or synthesized from morphemes. We follow both approaches and merge their results. Besides, generation of a lexicon of connected components encompasses extra tokenization and point-normalization steps to make the size of the lexicon tractable. We produce a lexicon of surface-words, reduce it into a lexicon of connected components, and finally into a lexicon of point normalized connected components. The lexicon of point normalized connected components contains 684,743 entries, showing a percent decrease of 97.17% from the word-lexicon

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Auditing Electronic Files of Quran Using Optical Character Recognition

Author: alaa fathala abdalazez hamda
آلاء فتح الله عبد العزيز حمدة
Publication venue: جامعة القدس
Publication date
Field of study

Al-Quds University Digital Repository