5,738 research outputs found

    Text Line Segmentation of Historical Documents: a Survey

    Full text link
    There is a huge amount of historical documents in libraries and in various National Archives that have not been exploited electronically. Although automatic reading of complete pages remains, in most cases, a long-term objective, tasks such as word spotting, text/image alignment, authentication and extraction of specific fields are in use today. For all these tasks, a major step is document segmentation into text lines. Because of the low quality and the complexity of these documents (background noise, artifacts due to aging, interfering lines),automatic text line segmentation remains an open research field. The objective of this paper is to present a survey of existing methods, developed during the last decade, and dedicated to documents of historical interest.Comment: 25 pages, submitted version, To appear in International Journal on Document Analysis and Recognition, On line version available at http://www.springerlink.com/content/k2813176280456k3

    Cross-document word matching for segmentation and retrieval of Ottoman divans

    Get PDF
    Cataloged from PDF version of article.Motivated by the need for the automatic indexing and analysis of huge number of documents in Ottoman divan poetry, and for discovering new knowledge to preserve and make alive this heritage, in this study we propose a novel method for segmenting and retrieving words in Ottoman divans. Documents in Ottoman are dif- ficult to segment into words without a prior knowledge of the word. In this study, using the idea that divans have multiple copies (versions) by different writers in different writing styles, and word segmentation in some of those versions may be relatively easier to achieve than in other versions, segmentation of the versions (which are difficult, if not impossible, with traditional techniques) is performed using information carried from the simpler version. One version of a document is used as the source dataset and the other version of the same document is used as the target dataset. Words in the source dataset are automatically extracted and used as queries to be spotted in the target dataset for detecting word boundaries. We present the idea of cross-document word matching for a novel task of segmenting historical documents into words. We propose a matching scheme based on possible combinations of sequence of sub-words. We improve the performance of simple features through considering the words in a context. The method is applied on two versions of Layla and Majnun divan by Fuzuli. The results show that, the proposed word-matching-based segmentation method is promising in finding the word boundaries and in retrieving the words across documents

    Arabic Manuscript Layout Analysis and Classification

    Get PDF

    Trajectories in the Development of Islamic Theological Thought: the Synthesis of Kalam

    Get PDF
    The field of Islamic theology (kalam) is not merely a receptacle for the presentation of the creedal statements and doctrinal catechisms of Islam; it derives its raison d’être not only from the articulation and elucidation of the doctrines of faith, but also by means of its rational and painstaking explication of dogma. While many of the dogmatic statements expressed in Islamic theology naturally emanate from a traditional substratum, countless more are the result of dialectical discussions as theologians expounded upon abstract constructs of religious dogma. Recent academic research is exploring the history, trends, and conceptual achievements behind the Islamic experiment with theology, providing insights into the tradition’s ability to integrate, refine, and expand theological constructs. Scholars are also concerned with issues such as origins, authenticity, and ascription, although such matters are not deflecting attention from the rich stock of resources and materials kalam has to offer

    Islam\u27s Low Mutterings at High Tide: Enslaved African Muslims in American Literature

    Get PDF
    This dissertation traces the underexplored figure of the African Muslim slave in American literature and proposes a new way to examine Islam in American cultural texts. It introduces a methodology for reading the traces of Islam called Allahgraphy: a method of interpretation that is attentive to Islamic studies and rhetorical techniques and that takes the surface as a profound source of meaning. This interpretative practice draws on postsecular theory, Islamic epistemology, and “post-critique” scholarship. Because of this confluence of diverse theories and epistemologies, Allahgraphy blurs religious and secular categories by deploying religious concepts for literary exegesis. Through an Allahgraphic reading, the dissertation examines modes of Islamic expression in a wide range of American works spanning the nineteenth and twentieth centuries. To unravel the diverse Muslim voices embedded within the American literary tradition, the dissertation proceeds chronologically through specific periods in African American culture and history, moving from slavery to post-Reconstruction to the post-civil rights era. The first two chapters focus on the nineteenth century and examine the works of ʿUmar ibn Sayyid, Bilali Muhammad, and Joel Chandler Harris. In these chapters, Allahgraphy is used to consider the material inscription of the source texts, specifically the African-Arabic manuscripts. The second half of the dissertation examines Islamic expressions in twentieth-century American texts. Through an analysis of works by Malcolm X and Toni Morrison, these two chapters explore the multiple sensory registers of Allahgraphy. The dissertation concludes by considering the appearance of the African Muslim slave in the diary of the Guantánamo prisoner, Mohamedou Ould Slahi. Ultimately, the dissertation aims to widen literary approaches to Islam in American works and to demonstrate the continuity of Muslim voices in the American literary works. In doing so, it delineates a long tradition of black Muslim Americans’ responses to Islamophobia

    HWD: A Novel Evaluation Score for Styled Handwritten Text Generation

    Full text link
    Styled Handwritten Text Generation (Styled HTG) is an important task in document analysis, aiming to generate text images with the handwriting of given reference images. In recent years, there has been significant progress in the development of deep learning models for tackling this task. Being able to measure the performance of HTG models via a meaningful and representative criterion is key for fostering the development of this research topic. However, despite the current adoption of scores for natural image generation evaluation, assessing the quality of generated handwriting remains challenging. In light of this, we devise the Handwriting Distance (HWD), tailored for HTG evaluation. In particular, it works in the feature space of a network specifically trained to extract handwriting style features from the variable-lenght input images and exploits a perceptual distance to compare the subtle geometric features of handwriting. Through extensive experimental evaluation on different word-level and line-level datasets of handwritten text images, we demonstrate the suitability of the proposed HWD as a score for Styled HTG. The pretrained model used as backbone will be released to ease the adoption of the score, aiming to provide a valuable tool for evaluating HTG models and thus contributing to advancing this important research area.Comment: Accepted at BMVC202
    corecore