1,033 research outputs found

    Automatic Palaeographic Exploration of Genizah Manuscripts

    Get PDF
    The Cairo Genizah is a collection of hand-written documents containing approximately 350,000 fragments of mainly Jewish texts discovered in the late 19th century. The fragments are today spread out in some 75 libraries and private collections worldwide, but there is an ongoing effort to document and catalogue all extant fragments. Palaeographic information plays a key role in the study of the Genizah collection. Script style, and–more specifically–handwriting, can be used to identify fragments that might originate from the same original work. Such matched fragments, commonly referred to as “joins”, are currently identified manually by experts, and presumably only a small fraction of existing joins have been discovered to date. In this work, we show that automatic handwriting matching functions, obtained from non-specific features using a corpus of writing samples, can perform this task quite reliably. In addition, we explore the problem of grouping various Genizah documents by script style, without being provided any prior information about the relevant styles. The automatically obtained grouping agrees, for the most part, with the palaeographic taxonomy. In cases where the method fails, it is due to apparent similarities between related scripts

    A Multiple-Expert Binarization Framework for Multispectral Images

    Full text link
    In this work, a multiple-expert binarization framework for multispectral images is proposed. The framework is based on a constrained subspace selection limited to the spectral bands combined with state-of-the-art gray-level binarization methods. The framework uses a binarization wrapper to enhance the performance of the gray-level binarization. Nonlinear preprocessing of the individual spectral bands is used to enhance the textual information. An evolutionary optimizer is considered to obtain the optimal and some suboptimal 3-band subspaces from which an ensemble of experts is then formed. The framework is applied to a ground truth multispectral dataset with promising results. In addition, a generalization to the cross-validation approach is developed that not only evaluates generalizability of the framework, it also provides a practical instance of the selected experts that could be then applied to unseen inputs despite the small size of the given ground truth dataset.Comment: 12 pages, 8 figures, 6 tables. Presented at ICDAR'1

    DeepOtsu: Document Enhancement and Binarization using Iterative Deep Learning

    Get PDF
    This paper presents a novel iterative deep learning framework and apply it for document enhancement and binarization. Unlike the traditional methods which predict the binary label of each pixel on the input image, we train the neural network to learn the degradations in document images and produce the uniform images of the degraded input images, which allows the network to refine the output iteratively. Two different iterative methods have been studied in this paper: recurrent refinement (RR) which uses the same trained neural network in each iteration for document enhancement and stacked refinement (SR) which uses a stack of different neural networks for iterative output refinement. Given the learned uniform and enhanced image, the binarization map can be easy to obtain by a global or local threshold. The experimental results on several public benchmark data sets show that our proposed methods provide a new clean version of the degraded image which is suitable for visualization and promising results of binarization using the global Otsu's threshold based on the enhanced images learned iteratively by the neural network.Comment: Accepted by Pattern Recognitio
    • …
    corecore