28 research outputs found

    Segmentation of Handwritten Document Images into Text Lines

    Get PDF

    A general approach for multi-oriented text line extraction of handwritten document

    Get PDF
    International audienceThe multi-orientation occurs frequently in ancient handwritten documents, where the writers try to update a document by adding some annotations in the margins. Due to the margin narrowness, this gives rise to lines in different directions and orientations. Document recognition needs to find the lines everywhere they are written whatever their orientation. This is why we propose in this paper a new approach allowing us to extract the multi-oriented lines in scanned documents. Because of the multi-orientation of lines and their dispersion in the page, we use an image meshing allowing us to progressively and locally determine the lines. Once the meshing is established, the orientation is determined using the Wigner-Ville distribution on the projection histogram profile. This local orientation is then enlarged to limit the orientation in the neighborhood. Afterward, the text lines are extracted locally in each zone basing on the follow-up of the orientation lines and the proximity of connected components. Finally, the connected components that overlap and touch in adjacent lines are separated. The morphology analysis of the terminal letters of Arabic words is here considered. The proposed approach has been experimented on 100 documents reaching an accuracy of about 98.6

    A hybrid approach for line segmentation in handwritten documents

    Get PDF
    This paper presents an approach for text line segmentation which combines connected component based and projection based information to take advantage of aspects of both methods. The proposed system finds baselines of each connected component. Lines are detected by grouping baselines of connected components belonging to each line by projection information. Components are assigned to lines according to different distance metrics with respect to their size. This study is one of the rare studies that apply line segmentation to Ottoman documents. Further, it proposes a new method, Fourier curve fitting, to detect the peaks in a projection profile. The algorithm is demonstrated on different printed and handwritten Ottoman datasets. Results show that the method manages to segment lines both from printed and handwritten documents under different writing conditions at least with 92% accuracy. © 2012 IEEE

    Seam Carving for Text Line Extraction on Color and Grayscale Historical Manuscripts

    Full text link

    Détection et séparation de lignes connectées dans les documents multi-orientés

    Get PDF
    International audienceCe papier présente une nouvelle approche pour la détection et la séparation de lignes connectées des documents manuscrits Arabe multi-orientés. En raison de la multi-orientation, nous utilisons un maillage automatique de l'image qui nous permet de déterminer progressivement et localement les lignes. Le maillage est initialisé avec une petite fenêtre où la taille est corrigée par extension jusqu'à ce que suffisamment de lignes aient été trouvées. La méthode du snake est utilisée pour l'extraction de ces lignes. Ensuite, l'orientation dans chaque fenêtre est estimée en utilisant la distribution de Wigner Ville (DWV) appliquée sur le profil de projection. Cette orientation est élargie pour limiter l'orientation dans les fenêtres voisines. Enfin, les lignes sont extraites dans chaque zone en se basant sur le suivi des lignes d'orientation. Une étape de post-traitement est appliquée pour séparer les lignes connectées. L'approche proposée a été expérimentée sur 100 documents atteignant une précision d'environ 98.6%

    Segmentation of ancient Arabic documents

    Get PDF
    International audienceThis chapter addresses the problem of ancient Arabic document segmentation. As ancient documents neither have a real physical structure nor logical one, the segmentation will be limited to textual area or to line extraction in the areas. Although this type of segmentation appears quite simple, its implementation remains a challenging task. This is due to the state of the old document where the image is of low quality, the lines are not straight, sinuous and connected. Given the failure of traditional methods, we proposed a method for line extraction in multi-oriented documents. The method is based on an image meshing that allows it to detect locally and safely the orientations. These orientations are then extended to larger areas. The orientation estimation uses the energy distribution of Cohen's class, more accurate than the projection method. Then, the method exploits the projection peaks to follow the connected components forming text lines. The approach ends with a final separation of connected lines, based on the exploitation of the morphology of terminal letters
    corecore