2,258 research outputs found

    Automatic Palaeographic Exploration of Genizah Manuscripts

    Get PDF
    The Cairo Genizah is a collection of hand-written documents containing approximately 350,000 fragments of mainly Jewish texts discovered in the late 19th century. The fragments are today spread out in some 75 libraries and private collections worldwide, but there is an ongoing effort to document and catalogue all extant fragments. Palaeographic information plays a key role in the study of the Genizah collection. Script style, and–more specifically–handwriting, can be used to identify fragments that might originate from the same original work. Such matched fragments, commonly referred to as “joins”, are currently identified manually by experts, and presumably only a small fraction of existing joins have been discovered to date. In this work, we show that automatic handwriting matching functions, obtained from non-specific features using a corpus of writing samples, can perform this task quite reliably. In addition, we explore the problem of grouping various Genizah documents by script style, without being provided any prior information about the relevant styles. The automatically obtained grouping agrees, for the most part, with the palaeographic taxonomy. In cases where the method fails, it is due to apparent similarities between related scripts

    Possibility Theory-Based Approach to Spam Email Detection

    Get PDF

    CCLAP: Controllable Chinese Landscape Painting Generation via Latent Diffusion Model

    Full text link
    With the development of deep generative models, recent years have seen great success of Chinese landscape painting generation. However, few works focus on controllable Chinese landscape painting generation due to the lack of data and limited modeling capabilities. In this work, we propose a controllable Chinese landscape painting generation method named CCLAP, which can generate painting with specific content and style based on Latent Diffusion Model. Specifically, it consists of two cascaded modules, i.e., content generator and style aggregator. The content generator module guarantees the content of generated paintings specific to the input text. While the style aggregator module is to generate paintings of a style corresponding to a reference image. Moreover, a new dataset of Chinese landscape paintings named CLAP is collected for comprehensive evaluation. Both the qualitative and quantitative results demonstrate that our method achieves state-of-the-art performance, especially in artfully-composed and artistic conception. Codes are available at https://github.com/Robin-WZQ/CCLAP.Comment: 8 pages,13 figure

    La encrucijada del patrimonio manuscrito de África Oriental: ¿Del polvo a lo digital o polvo digital?

    Get PDF
    Ever since colonial times, West Africa Arabic manuscripts have been the object of ambiguous attention on the part of administrators, conservators and scholars. As a result, collections were first concealed or disclosed (depending on the predatory or protective nature of Western attentions), then targeted by modern scientific initiatives focused on bibliographic description, content analysis and preservation. A review of their accomplishments and shortcomings will help understand how many such projects often failed to meet – or even understand – the expectations of their intended and potential users. Or if they did meet such expectations, they misunderstood or underestimated the nature of the tools they employed and the rapidly evolving technological and cultural environment that nurtures and supports them. Only by understanding these evolving trends and realities, and therefore engaging information professionals equipped with the appropriate knowledge and skills to take advantage of them, will new initiatives to preserve and document West African Arabic manuscript heritage succeed in providing continuous and relevant access to its intellectual content and material culture.Ever since colonial times, West Africa Arabic manuscripts have been the object of ambiguous attention on the part of administrators, conservators and scholars. As a result, collections were first concealed or disclosed (depending on the predatory or protective nature of Western attentions), then targeted by modern scientific initiatives focused on bibliographic description, content analysis and preservation. A review of their accomplishments and shortcomings will help understand how many such projects often failed to meet – or even understand – the expectations of their intended and potential users. Or if they did meet such expectations, they misunderstood or underestimated the nature of the tools they employed and the rapidly evolving technological and cultural environment that nurtures and supports them. Only by understanding these evolving trends and realities, and therefore engaging information professionals equipped with the appropriate knowledge and skills to take advantage of them, will new initiatives to preserve and document West African Arabic manuscript heritage succeed in providing continuous and relevant access to its intellectual content and material culture.Desde la época colonial, los manuscritos arábigos conservados en África Oriental han sido objeto de una atención ambigua por parte de las administraciones, los conservadores y los investigadores. Como resultado, las colecciones han sido, primeramente, ocultadas o reveladas (según haya sido la naturaleza depredadora o protectora de Occidente). Más tarde, han sido objeto de iniciativas científicas modernas centradas en la descripción bibliográfica, el análisis del contenido y la conservación. Una revisión tanto de los logros como de los errores ayudará a entender cómo algunos proyectos han fallado a menudo a la hora de contemplar –o incluso entender– las necesidades de los potenciales lectores. Si se han contemplado dichas expectativas, se ha malinterpretado o infravalorado la naturaleza de las herramientas que se han empleado, así como el rápido desarrollo tecnológico y cultural que las sostiene. Sólo si entendemos las realidades y tendencias inherentes y si atendemos, consecuencia, a la información que proporcionan, los profesionales, equipados con nuevas competencias y recursos, podrán desarrollar nuevas iniciativas para preservar el patrimonio documental de África Oriental. De este modo, se conseguirá permitir un acceso continuo y relevante al contenido intelectual de dicho patrimonio y a la cultura material que representa

    Historical stereotypes and histories of stereotypes

    Get PDF

    A novel image matching approach for word spotting

    Get PDF
    Word spotting has been adopted and used by various researchers as a complementary technique to Optical Character Recognition for document analysis and retrieval. The various applications of word spotting include document indexing, image retrieval and information filtering. The important factors in word spotting techniques are pre-processing, selection and extraction of proper features and image matching algorithms. The Correlation Similarity Measure (CORR) algorithm is considered to be a faster matching algorithm, originally defined for finding similarities between binary patterns. In the word spotting literature the CORR algorithm has been used successfully to compare the GSC binary features extracted from binary word images, i.e., Gradient, Structural and Concavity (GSC) features. However, the problem with this approach is that binarization of images leads to a loss of very useful information. Furthermore, before extracting GSC binary features the word images must be skew corrected and slant normalized, which is not only difficult but in some cases impossible in Arabic and modified Arabic scripts. We present a new approach in which the Correlation Similarity Measure (CORR) algorithm has been used innovatively to compare Gray-scale word images. In this approach, binarization of images, skew correction and slant normalization of word images are not required at all. The various features, i.e., projection profiles, word profiles and transitional features are extracted from the Gray-scale word images and converted into their binary equivalents, which are compared via CORR algorithm with greater speed and higher accuracy. The experiments have been conducted on Gray-scale versions of newly created handwritten databases of Pashto and Dari languages, written in modified Arabic scripts. For each of these languages we have used 4599 words relating to 21 different word classes collected from 219 writers. The average precision rates achieved for Pashto and Dari languages were 93.18 % and 93.75 %, respectively. The time taken for matching a pair of images was 1.43 milli-seconds. In addition, we will present the handwritten databases for two well-known Indo- Iranian languages, i.e., Pashto and Dari languages. These are large databases which contain six types of data, i.e., Dates, Isolated Digits, Numeral Strings, Isolated Characters, Different Words and Special Symbols, written by native speakers of the corresponding languages
    corecore