8,085 research outputs found
Image Segmentation and Multiple skew estimation, correction in printed and handwritten documents
Analysis of handwritten document has always been a challenging task in the field of image processing. Various algorithms have been developed in finding solution to this problem. The algorithms implemented here for segmentation and skew detection works not only on printed or scanned document images but for also handwritten document images which creates an edge over other methodologies. Here Line segmentation for both printed and handwritten document image is done using two methods namely Histogram projections and Hough Transform assuming that input document image consists of no major skews. For Histogram Projection to work correct, the document must not contain even slight skews. Hough transform gives better results than the former case. Word Segmentation can be done using the connected components analysis. Here, we first identify connected components in the printed or handwritten document image. A methodology is being used here which detects multiple skews in multi handwritten documents or printed ones. Using clustering algorithms, we detect multiple skew blocks in a handwritten document image or printed document image or a combination of both. The algorithm used here also works for skewed multi handwritten text blocks
Text Line Segmentation of Historical Documents: a Survey
There is a huge amount of historical documents in libraries and in various
National Archives that have not been exploited electronically. Although
automatic reading of complete pages remains, in most cases, a long-term
objective, tasks such as word spotting, text/image alignment, authentication
and extraction of specific fields are in use today. For all these tasks, a
major step is document segmentation into text lines. Because of the low quality
and the complexity of these documents (background noise, artifacts due to
aging, interfering lines),automatic text line segmentation remains an open
research field. The objective of this paper is to present a survey of existing
methods, developed during the last decade, and dedicated to documents of
historical interest.Comment: 25 pages, submitted version, To appear in International Journal on
Document Analysis and Recognition, On line version available at
http://www.springerlink.com/content/k2813176280456k3
Estimation of the Handwritten Text Skew Based on Binary Moments
Binary moments represent one of the methods for the text skew estimation in binary images. It has been used widely for the skew identification of the printed text. However, the handwritten text consists of text objects, which are characterized with different skews. Hence, the method should be adapted for the handwritten text. This is achieved with the image splitting into separate text objects made by the bounding boxes. Obtained text objects represent the isolated binary objects. The application of the moment-based method to each binary object evaluates their local text skews. Due to the accuracy, estimated skew data can be used as an input to the algorithms for the text line segmentation
Component-based Segmentation of words from handwritten Arabic text
Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words segmentation. Meanwhile, an improved projection based method is also employed for baseline detection. The proposed method has been successfully tested on IFN/ENIT database consisting of 26459 Arabic words handwritten by 411 different writers, and the results were promising and very encouraging in more accurate detection of the baseline and segmentation of words for further recognition
- …