6 research outputs found

    Readability Enhancement and Palimpsest Decipherment of Historical Manuscripts

    Get PDF
    This paper presents image acquisition and readability enhancement techniques for historical manuscripts developed in the interdisciplinary project “The Enigma of the Sinaitic Glagolitic Tradition” (Sinai II Project).1 We are mainly dealing with parchment documents originating from the 10th to the 12th centuries from St. Cather- ine’s Monastery on Mount Sinai. Their contents are being analyzed, fully or partly transcribed and edited in the course of the project. For comparison also other mss. are taken into consideration. The main challenge derives from the fact that some of the manuscripts are in a bad condition due to various damages, e.g. mold, washed out or faded text, etc. or contain palimpsest (=overwritten) parts. Therefore, the manuscripts investigated are imaged with a portable multispectral imaging system. This non-invasive conservation technique has proven extremely useful for the exami- nation and reconstruction of vanished text areas and erased or washed o palimpsest texts. Compared to regular white light, the illumination with speci c wavelengths highlights particular details of the documents, i.e. the writing and writing material, ruling, and underwritten text. In order to further enhance the contrast of the de- graded writings, several Blind Source Separation techniques are applied onto the multispectral images, including Principal Component Analysis (PCA), Independent Component Analysis (ICA) and others. Furthermore, this paper reports on other latest developments in the Sinai II Project, i.e. Document Image Dewarping, Automatic Layout Analysis, the recent result of another project related to our work: the image processing tool Paleo Toolbar, and the launch of the series Glagolitica Sinaitica

    Restoration of Multispectral Images of Ancient Documents

    No full text
    Arbeit an der Bibliothek noch nicht eingelangt - Daten nicht geprüftAbweichender Titel nach Übersetzung der Verfasserin/des VerfassersThis thesis is concerned with the restoration of images of historical documents. The ancient writings imaged contain partially faded-out characters or are degraded by background variations. MultiSpectral Imaging (MSI) has proven to be a valuable tool for the non-invasive investigation of such ancient manuscripts, since it can be used to acquire information that is invisible to the human eye. The document images which are examined in this work, have been acquired with a portable MSI system. The imaging in narrowband spectral ranges led to a considerable legibility increase. The images taken form the basis for two kinds of restoration techniques that are introduced in this work: First, an enhancement method is proposed that projects the multispectral samples on a lower dimensional space by applying an Linear Discriminant Analysis (LDA) based transformation. Thus, not only the dimensionality of the multispectral images is lowered, but also the legibility of the degraded writings is increased. A qualitative analysis conducted by philologists shows that the method partially outperforms unsupervised dimension reduction methods, which are used in previous works. The second aim of this work is the separation of the ancient writings from the remaining background. Such binarization methods are used as a preprocessing step for other document image analysis methods, including OCR (OCR) or writer identification. Multiple binarization methods have been developed for the multispectral document images considered: Two methods make use of a target detection algorithm, which is used to determine if ink is present within the multispectral samples. A further binarization method is introduced, which makes use of Gaussian Mixture Model (GMM) based clustering. The methods introduced make use of spatial and spectral information. Furthermore, a Fully Convolutional Network (FCN) is used for the binarization task. The methods are evaluated on two databases: First, the methods are applied on the MultiSpectral Text Extraction (MS-TEx) dataset, where the methods achieve promising results. The best performances are gained by the target detection-based methods. These methods participated in the MS-TEx 2015 contest, where they were ranked first and second. Second, the methods are evaluated on the MultiSpectral Document Binarization (MSBin) dataset. This dataset is larger and allows for a successful training of the FCN, which outperforms the remaining binarization methods. Nevertheless, the results gained by all methods proposed are superior to the results which are gained by a traditional binarization approach that is designed for grayscale images.15

    Mass Digitization of Archival Documents using Mobile Phones

    No full text
    The final publication is available via https://doi.org/10.1145/3151509.3151526.Digital copies of historical documents are needed for the Digital Humanities. Currently, cameras of standard mobile phones are able to capture documents with a resolution of about 330 dpi for document sizes up to DIN A4 (German standard, 297 x 210 mm), which allows a digitization of documents using a standard device. Thus, scholars are able to take images of documents in archives themselves without the need of book scanners or other devices. This paper presents a scanning app, which comprises a real time page detection, quality assessment (focus measure) and an automated detection of a page turn over if books are scanned. Additionally, a portable device - the ScanTent - to place the mobile phone during scanning is presented. The page detection is evaluated on the ICDAR2015 SmartDoc competition dataset and shows a reliable page detection with an average Jaccard index of 75%.European Union's Horizon 202

    Review of studies on tree species classification from remotely sensed data

    No full text
    corecore