Search CORE

28,616 research outputs found

Component-based Segmentation of words from handwritten Arabic text

Author: AlKhateeb J. H.
Ipson S.
Jiang J.
Ren Jinchang
Publication venue
Publication date: 28/05/2008
Field of study

Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words segmentation. Meanwhile, an improved projection based method is also employed for baseline detection. The proposed method has been successfully tested on IFN/ENIT database consisting of 26459 Arabic words handwritten by 411 different writers, and the results were promising and very encouraging in more accurate detection of the baseline and segmentation of words for further recognition

University of Strathclyde Institutional Repository

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Text Line Segmentation of Historical Documents: a Survey

Author: A. Amin
A. Bozzi
A. Downton
A. Jain
A. Kolcz
Abderrazak Zahour
Bruno Taconet
C.L. Tan
C.V. Lakshmi
E. Cohen
E. Oztop
G. Seni
I.-K. Kim
K. Wong
L. Likforman-Sulem
L. Likforman-Sulem
L. Likforman-Sulem
L. O’Gorman
L.A. Fletcher
Laurence Likforman-Sulem
R. Plamondon
R.D. Lins
U. Pal
V. Shapiro
Ventadert Gusnard de de
Y. Solihin
Y.H. Tseng
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/04/2007
Field of study

There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

arXiv.org e-Print Archive

Crossref

An efficient scheme for tilt correction in Arabic OCR system

Author: Sarfraz M.
Shahab S.A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2005
Field of study

Preprocessing stage is required in almost every image processing application ranging from biometric analysis to document image analysis. An input image or information need to be normalized and converted into format acceptable by OCR (optical character recognition) system. OCR systems typically assume that documents were printed with a single direction of the text and that the acquisition process did not introduce a relevant skew. Practically this assumption is not very strong and printed documents could be skewed at some angle with horizontal axis. In this paper, we have proposed skew estimation of document images for Arabic fonts. It is based upon the specific feature of Arabic script. In our proposed scheme, we scan for the occurrence of letter 'alif' and estimate the tilt based upon its slope. Extensive experimentation was performed and scheme was found to be very effective

KFUPM ePrints