1,302 research outputs found
Component-based Segmentation of words from handwritten Arabic text
Efficient preprocessing is very essential for automatic recognition of handwritten documents. In this paper, techniques on segmenting words in handwritten Arabic text are presented. Firstly, connected components (ccs) are extracted, and distances among different components are analyzed. The statistical distribution of this distance is then obtained to determine an optimal threshold for words segmentation. Meanwhile, an improved projection based method is also employed for baseline detection. The proposed method has been successfully tested on IFN/ENIT database consisting of 26459 Arabic words handwritten by 411 different writers, and the results were promising and very encouraging in more accurate detection of the baseline and segmentation of words for further recognition
Automatic Palaeographic Exploration of Genizah Manuscripts
The Cairo Genizah is a collection of hand-written documents containing approximately
350,000 fragments of mainly Jewish texts discovered in the late 19th
century. The
fragments are today spread out in some 75 libraries and private collections worldwide,
but there is an ongoing effort to document and catalogue all extant fragments.
Palaeographic information plays a key role in the study of the Genizah collection.
Script style, and–more specifically–handwriting, can be used to identify fragments that
might originate from the same original work. Such matched fragments, commonly
referred to as “joins”, are currently identified manually by experts, and presumably only
a small fraction of existing joins have been discovered to date. In this work, we show
that automatic handwriting matching functions, obtained from non-specific features
using a corpus of writing samples, can perform this task quite reliably. In addition, we
explore the problem of grouping various Genizah documents by script style, without
being provided any prior information about the relevant styles. The automatically
obtained grouping agrees, for the most part, with the palaeographic taxonomy. In cases
where the method fails, it is due to apparent similarities between related scripts
- …