5,738 research outputs found
Text Line Segmentation of Historical Documents: a Survey
There is a huge amount of historical documents in libraries and in various
National Archives that have not been exploited electronically. Although
automatic reading of complete pages remains, in most cases, a long-term
objective, tasks such as word spotting, text/image alignment, authentication
and extraction of specific fields are in use today. For all these tasks, a
major step is document segmentation into text lines. Because of the low quality
and the complexity of these documents (background noise, artifacts due to
aging, interfering lines),automatic text line segmentation remains an open
research field. The objective of this paper is to present a survey of existing
methods, developed during the last decade, and dedicated to documents of
historical interest.Comment: 25 pages, submitted version, To appear in International Journal on
Document Analysis and Recognition, On line version available at
http://www.springerlink.com/content/k2813176280456k3
Cross-document word matching for segmentation and retrieval of Ottoman divans
Cataloged from PDF version of article.Motivated by the need for the automatic
indexing and analysis of huge number of documents in
Ottoman divan poetry, and for discovering new knowledge
to preserve and make alive this heritage, in this study we
propose a novel method for segmenting and retrieving
words in Ottoman divans. Documents in Ottoman are dif-
ficult to segment into words without a prior knowledge of
the word. In this study, using the idea that divans have
multiple copies (versions) by different writers in different
writing styles, and word segmentation in some of those
versions may be relatively easier to achieve than in other
versions, segmentation of the versions (which are difficult,
if not impossible, with traditional techniques) is performed
using information carried from the simpler version. One
version of a document is used as the source dataset and the
other version of the same document is used as the target
dataset. Words in the source dataset are automatically
extracted and used as queries to be spotted in the target
dataset for detecting word boundaries. We present the idea
of cross-document word matching for a novel task of
segmenting historical documents into words. We propose a
matching scheme based on possible combinations of
sequence of sub-words. We improve the performance of
simple features through considering the words in a context.
The method is applied on two versions of Layla and
Majnun divan by Fuzuli. The results show that, the proposed
word-matching-based segmentation method is
promising in finding the word boundaries and in retrieving
the words across documents
Trajectories in the Development of Islamic Theological Thought: the Synthesis of Kalam
The field of Islamic theology (kalam) is not merely a receptacle for the presentation of the creedal statements and doctrinal catechisms of Islam; it derives its raison d’être not only from the articulation and elucidation of the doctrines of faith, but also by means of its rational and painstaking explication of dogma. While many of the dogmatic statements expressed in Islamic theology naturally emanate from a traditional substratum, countless more are the result of dialectical discussions as theologians expounded upon abstract constructs of religious dogma. Recent academic research is exploring the history, trends, and conceptual achievements behind the Islamic experiment with theology, providing insights into the tradition’s ability to integrate, refine, and expand theological constructs. Scholars are also concerned with issues such as origins, authenticity, and ascription, although such matters are not deflecting attention from the rich stock of resources and materials kalam has to offer
Islam\u27s Low Mutterings at High Tide: Enslaved African Muslims in American Literature
This dissertation traces the underexplored figure of the African Muslim slave in American literature and proposes a new way to examine Islam in American cultural texts. It introduces a methodology for reading the traces of Islam called Allahgraphy: a method of interpretation that is attentive to Islamic studies and rhetorical techniques and that takes the surface as a profound source of meaning. This interpretative practice draws on postsecular theory, Islamic epistemology, and “post-critique” scholarship. Because of this confluence of diverse theories and epistemologies, Allahgraphy blurs religious and secular categories by deploying religious concepts for literary exegesis. Through an Allahgraphic reading, the dissertation examines modes of Islamic expression in a wide range of American works spanning the nineteenth and twentieth centuries. To unravel the diverse Muslim voices embedded within the American literary tradition, the dissertation proceeds chronologically through specific periods in African American culture and history, moving from slavery to post-Reconstruction to the post-civil rights era. The first two chapters focus on the nineteenth century and examine the works of ʿUmar ibn Sayyid, Bilali Muhammad, and Joel Chandler Harris. In these chapters, Allahgraphy is used to consider the material inscription of the source texts, specifically the African-Arabic manuscripts. The second half of the dissertation examines Islamic expressions in twentieth-century American texts. Through an analysis of works by Malcolm X and Toni Morrison, these two chapters explore the multiple sensory registers of Allahgraphy. The dissertation concludes by considering the appearance of the African Muslim slave in the diary of the Guantánamo prisoner, Mohamedou Ould Slahi. Ultimately, the dissertation aims to widen literary approaches to Islam in American works and to demonstrate the continuity of Muslim voices in the American literary works. In doing so, it delineates a long tradition of black Muslim Americans’ responses to Islamophobia
HWD: A Novel Evaluation Score for Styled Handwritten Text Generation
Styled Handwritten Text Generation (Styled HTG) is an important task in
document analysis, aiming to generate text images with the handwriting of given
reference images. In recent years, there has been significant progress in the
development of deep learning models for tackling this task. Being able to
measure the performance of HTG models via a meaningful and representative
criterion is key for fostering the development of this research topic. However,
despite the current adoption of scores for natural image generation evaluation,
assessing the quality of generated handwriting remains challenging. In light of
this, we devise the Handwriting Distance (HWD), tailored for HTG evaluation. In
particular, it works in the feature space of a network specifically trained to
extract handwriting style features from the variable-lenght input images and
exploits a perceptual distance to compare the subtle geometric features of
handwriting. Through extensive experimental evaluation on different word-level
and line-level datasets of handwritten text images, we demonstrate the
suitability of the proposed HWD as a score for Styled HTG. The pretrained model
used as backbone will be released to ease the adoption of the score, aiming to
provide a valuable tool for evaluating HTG models and thus contributing to
advancing this important research area.Comment: Accepted at BMVC202
- …