Search CORE

28 research outputs found

Binarization-free Text Line Extraction for Historical Manuscripts

Author: Arvanitopoulos Darginis Nikolaos
Süsstrunk Sabine
Publication venue
Publication date: 04/11/2014
Field of study

Segmentation of Handwritten Document Images into Text Lines

Author: Vassilis Katsouros
Vassilis Papavassiliou
Publication venue: 'IntechOpen'
Publication date: 19/04/2011
Field of study

A general approach for multi-oriented text line extraction of handwritten document

Author: Belaïd Abdel
Ouwayed Nazih
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2011
Field of study

International audienceThe multi-orientation occurs frequently in ancient handwritten documents, where the writers try to update a document by adding some annotations in the margins. Due to the margin narrowness, this gives rise to lines in different directions and orientations. Document recognition needs to ﬁnd the lines everywhere they are written whatever their orientation. This is why we propose in this paper a new approach allowing us to extract the multi-oriented lines in scanned documents. Because of the multi-orientation of lines and their dispersion in the page, we use an image meshing allowing us to progressively and locally determine the lines. Once the meshing is established, the orientation is determined using the Wigner-Ville distribution on the projection histogram proﬁle. This local orientation is then enlarged to limit the orientation in the neighborhood. Afterward, the text lines are extracted locally in each zone basing on the follow-up of the orientation lines and the proximity of connected components. Finally, the connected components that overlap and touch in adjacent lines are separated. The morphology analysis of the terminal letters of Arabic words is here considered. The proposed approach has been experimented on 100 documents reaching an accuracy of about 98.6

INRIA a CCSD electronic archive server

A hybrid approach for line segmentation in handwritten documents

Author: Adiguzel H.
Duygulu P.
Sahin E.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

This paper presents an approach for text line segmentation which combines connected component based and projection based information to take advantage of aspects of both methods. The proposed system finds baselines of each connected component. Lines are detected by grouping baselines of connected components belonging to each line by projection information. Components are assigned to lines according to different distance metrics with respect to their size. This study is one of the rare studies that apply line segmentation to Ottoman documents. Further, it proposes a new method, Fourier curve fitting, to detect the peaks in a projection profile. The algorithm is demonstrated on different printed and handwritten Ottoman datasets. Results show that the method manages to segment lines both from printed and handwritten documents under different writing conditions at least with 92% accuracy. © 2012 IEEE

Seam Carving for Text Line Extraction on Color and Grayscale Historical Manuscripts

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Détection et séparation de lignes connectées dans les documents multi-orientés

Author: Belaïd Abdel
Ouwayed Nazih
Publication venue: HAL CCSD
Publication date: 18/03/2010
Field of study

International audienceCe papier présente une nouvelle approche pour la détection et la séparation de lignes connectées des documents manuscrits Arabe multi-orientés. En raison de la multi-orientation, nous utilisons un maillage automatique de l'image qui nous permet de déterminer progressivement et localement les lignes. Le maillage est initialisé avec une petite fenêtre où la taille est corrigée par extension jusqu'à ce que suffisamment de lignes aient été trouvées. La méthode du snake est utilisée pour l'extraction de ces lignes. Ensuite, l'orientation dans chaque fenêtre est estimée en utilisant la distribution de Wigner Ville (DWV) appliquée sur le profil de projection. Cette orientation est élargie pour limiter l'orientation dans les fenêtres voisines. Enfin, les lignes sont extraites dans chaque zone en se basant sur le suivi des lignes d'orientation. Une étape de post-traitement est appliquée pour séparer les lignes connectées. L'approche proposée a été expérimentée sur 100 documents atteignant une précision d'environ 98.6%

INRIA a CCSD electronic archive server

Segmentation of ancient Arabic documents

Author: Belaïd Abdel
Ouwayed Nazih
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/03/2011
Field of study

International audienceThis chapter addresses the problem of ancient Arabic document segmentation. As ancient documents neither have a real physical structure nor logical one, the segmentation will be limited to textual area or to line extraction in the areas. Although this type of segmentation appears quite simple, its implementation remains a challenging task. This is due to the state of the old document where the image is of low quality, the lines are not straight, sinuous and connected. Given the failure of traditional methods, we proposed a method for line extraction in multi-oriented documents. The method is based on an image meshing that allows it to detect locally and safely the orientations. These orientations are then extended to larger areas. The orientation estimation uses the energy distribution of Cohen's class, more accurate than the projection method. Then, the method exploits the projection peaks to follow the connected components forming text lines. The approach ends with a final separation of connected lines, based on the exploitation of the morphology of terminal letters

INRIA a CCSD electronic archive server