3 research outputs found

    Restoration of arbitrarily warped historical document images using flow lines

    No full text
    Historical documents frequently suffer from arbitrary geometric distortions (warping and folds) due to storage conditions, use and to, some extent, the printing process of the time. In addition, page curl can be prominent due to the scanning technique used. Such distortions adversely affect OCR and print-on-demand quality. Previous approaches to geometric restoration either focus only on the correction of page curl or require supplementary informatio n obtained by additional scanning hardware ` not practical for existing scans. This paper presents a new approach to detect and restore arbitrary warping and folds, in addition to page curl. Warped text lines and the smooth deformation between them are precisely modelled as primary and secondary flow lines that are then restored to their original linear shape. Preliminary, but representative, experimental results, in comparison to a leading page curl removal method and an industry-standard commercial system, demonstrate the effectiveness of the proposed metho

    Geometric correction of historical Arabic documents

    Get PDF
    Geometric deformations in historical documents significantly influence the success of both Optical Character Recognition (OCR) techniques and human readability. They may have been introduced at any time during the life cycle of a document, from when it was first printed to the time it was digitised by an imaging device. This Thesis focuses on the challenging domain of geometric correction of Arabic historical documents, where background research has highlighted that existing approaches for geometric correction of Latin-script historical documents are not sensitive to the characteristics of text in Arabic documents and therefore cannot be applied successfully. Text line segmentation and baseline detection algorithms have been investigated to propose a new more suitable one for warped Arabic historical document images. Advanced ideas for performing dewarping and geometric restoration on historical Arabic documents, as dictated by the specific characteristics of the problem have been implemented.In addition to developing an algorithm to detect accurate baselines of historical printed Arabic documents the research also contributes a new dataset consisting of historical Arabic documents with different degrees of warping severity.Overall, a new dewarping system, the first for Historical Arabic documents, has been developed taking into account both global and local features of the text image and the patterns of the smooth distortion between text lines. By using the results of the proposed line segmentation and baseline detection methods, it can cope with a variety of distortions, such as page curl, arbitrary warping and fold
    corecore