11 research outputs found

    Digital Paleography: Using the Digital Representation of Jawi Manuscripts to Support Paleographic Analysis

    Get PDF
    Palaeography is the study of ancient handwritten manuscripts to date the age and to localize ancient and medieval scripts. It also deals with analysing the development of the letters shape. Ancient Jawi manuscripts are one of the least studiedarea. Nowadays, over 7789 known Jawi manuscripts are kept in custody of various libraries in Malaysia. Most of these manuscripts were undated with unknown authors and location of origin. Analysing the different types of writing styles and recognizing the manuscript illuminations can discover this important information. In this paper, we discuss the palaeographical analysis from the perspective of computer science and propose a general framework for that. This process involves investigation of Arabic influence on the Jawi manuscript writings, establishing the palaeographical type of the script, and classification of writing styles based on local and global Jawi image features

    Digital Paleography: Using the Digital Representation of Jawi Manuscripts to Support Paleographic Analysis

    Get PDF
    Palaeography is the study of ancient handwritten manuscripts to date the age and to localize ancient and medieval scripts. It also deals with analysing the development of the letters shape. Ancient Jawi manuscripts are one of the least studiedarea. Nowadays, over 7789 known Jawi manuscripts are kept in custody of various libraries in Malaysia. Most of these manuscripts were undated with unknown authors and location of origin. Analysing the different types of writing styles and recognizing the manuscript illuminations can discover this important information. In this paper, we discuss the palaeographical analysis from the perspective of computer science and propose a general framework for that. This process involves investigation of Arabic influence on the Jawi manuscript writings, establishing the palaeographical type of the script, and classification of writing styles based on local and global Jawi image features

    Kerangka Paleografi Jawi Digital : Satu Cadangan Awal

    Get PDF
    Kajian paleografi adalah kajian untuk mengetahui tarikh dan juga tempat di mana manuskrip lama ditulis. Pada masa ini, bilangan manuskrip jawi yang terkumpul di perpustakaan utama di Malaysia sahaja dianggarkan berjumlah 7789 buah. Malah terdapat manuskrip yang tidak dapat diketahui tarikh dan tempat asalnya. Selain itu, faktor khat arab yang mempengaruhi tulisan jawi juga memainkan peranan ke atas kajian paleografi. Di dalam manuskrip jawi, wujudnya variasi jenis tulisan khat arab yang mengambarkan pengaruh penulis dan juga membuktikan bahawa manuskrip tersebut ditulis oleh penulis yang ramai. Keperluan kepada paleografi jawi digital adalah amat perlu memandang manuskrip yang banyak dan juga kini terdapat salinan manuskrip yang dibuat di dalam bentuk digital. Dalam kertas ini, kajian menjurus kepada perspektif ahli sains komputer. Kaedah penyelidik-penyelidik untuk tulisan Latin, India, Ibrani dikaji, difahami dan diperbandingkan. Satu kerangka kajian untuk paleografi jawi dicadangkan untuk mengetahui jenis pengaruh khat arab ke atas manuskrip-manuskrip jawi. Fitur-fitur global dan setempat pada imej jawi akan dikaji dan digunakan untuk tujuan pengelasan. Pengelasan fitur berasaskan jenis-jenis tulisan khat di dalam manuskrip akan menyumbangkan kepada pengenal pastian manuskrip. Selanjutnya ianya akan dapat menyumbangkan kepada asas penentuan jenis-jenis tulisan di dalam manuskrip melayu

    Human Reading Based Strategies for off-line Arabic Word Recognition

    Get PDF
    International audienceThis paper summarizes some techniques proposed for off-line Arabic word recognition. The point of view developed here concerns the human reading favoring an interactive mechanism between global memorization and local checking making easier the recognition of complex scripts as Arabic. According to this consideration, some specific papers are analyzed and their strategies commente


    Get PDF
    A decrease in data storage costs and widespread use of scanning devices has led to massive quantities of scanned digital documents in corporations, organizations, and governments around the world. Automatically processing these large heterogeneous collections can be difficult due to considerable variation in resolution, quality, font, layout, noise, and content. In order to make this data available to a wide audience, methods for efficient retrieval and analysis from large collections of document images remain an open and important area of research. In this proposal, we present research in three areas that augment the current state of the art in the retrieval and analysis of large heterogeneous document image collections. First, we explore an efficient approach to document image retrieval, which allows users to perform retrieval against large image collections in a query-by-example manner. Our approach is compared to text retrieval of OCR on a collection of 7 million document images collected from lawsuits against tobacco companies. Next, we present research in document verification and change detection, where one may want to quickly determine if two document images contain any differences (document verification) and if so, to determine precisely what and where changes have occurred (change detection). A motivating example is legal contracts, where scanned images are often e-mailed back and forth and small changes can have severe ramifications. Finally, approaches useful for exploiting the biometric properties of handwriting in order to perform writer identification and retrieval in document images are examined

    Writer Identification of Arabic Handwritten Documents

    Get PDF

    Writer Identification of Arabic Handwritten Documents

    Get PDF

    Biometrics Writer Recognition for Arabic language: Analysis and Classification techniques using Subwords Features

    Get PDF
    Handwritten text in any language is believed to convey a great deal of information about writers’ personality and identity. Indeed, handwritten signature has long been accepted as an authentication of the writer’s physical stamp on financial and legal deals as well official/personal documents and works of art. Handwritten documents are frequently used as evidences in forensic tasks. Handwriting skills is learnt and developed from the early schooling stages. Research interest in behavioral biometrics was the main driving force behind the growth in research into Writer Identification (WI) from handwritten text, but recent rise in terrorism associated with extreme religious ideologies spreading primarily, but not exclusively, from the middle-east has led to a surge of interest in WI from handwritten text in Arabic and similar languages. This thesis is the main outcome of extensive research investigations conducted with the aim of developing an automatic identification of a person from handwritten Arabic text samples. My motivations and interests, as an Iraqi researcher, emanate from my multi-faceted desires to provide scientific support for my people in their fight against terrorism by providing forensic evidences, and as contribute to the ongoing digitization of the Iraqi National archive as well as the wealth of religious and historical archives in Iraq and the middle-east. Good knowledge of the underlying language is invaluable in this project. Despite the rising interest in this recognition modality worldwide, Arabic writer identification has not been addressed as extensively as Latin writer identification. However, in recent years some new Arabic writer identification approaches have been proposed some of which are reviewed in this thesis. Arabic is a cursive language when handwritten. This means that each and every writer in this language develops some unique features that could demonstrate writer’s habits and style. These habits and styles are considered as unique WI features and determining factors. Existing dominating approaches to WI are based on recognizing handwriting habits/styles are embedded in certain parts/components of the written texts. Although the appearance of these components within long text contain rich information and clues to writer identity, the most common approaches to WI in Arabic in the literature are based on features extracted from paragraph(s), line(s), word(s), character(s), and/or a part of a character. Generally, Arabic words are made up of one or more subwords at the end of each; there is a connected stroke with a certain style of which seem to be most representative of writers habits. Another feature of Arabic writing is to do with diacritics that are added to written words/subwords, to add meaning and pronunciation. Subwords are more frequent in written Arabic text and appear as part of several different words or as full individual words. Thus, we propose a new innovative approach based on a seemingly plausible hypothesis that subwords based WI yields significant increase in accuracy over existing approaches. The thesis most significant contributions can be summarized as follows: - Developed a high performing segmentation of scanned text images, that combines threshold based binarisation, morphological operation and active shape model. - Defined digital measures and formed a 15-dimensional feature vectors representations of subwords that implicitly cover its diacritics and strokes. A pilot study that incrementally added features according to writer discriminating power. This reduced subwords feature vector dimension to 8, two of which were modelled as time series. - For the dependent 8-dimensional WI scheme, we identify the best performing set of subwords (best 22 subwords out of 49 then followed by best 11 out of these 22 subwords). - We established the validity of our hypothesis for different versions of subwords based WI schemes by providing empirical evidence when testing on a number of existing text dependent and in text-dependent databases plus a simulated text-in text-dependent DB. The text-dependent scenario results exhibited possible present of the Doddington Zoo phenomena. - The final optimal subword based WI scheme, not only removes the need to include diacritics as part of the subword but also demonstrating that including diacritics within subwords impairs the WI discriminating power of subwords. This should not be taken to discredit research that are based on diacritics based WI. Also in this subword body (without diacritics) base WI scheme, resulted in eliminating the presence of Doddington Zoo effect. - Finally, a significant but un-intended consequence of using subwords for WI is that there is no difference between a text-independent scenario and text-dependent one. In fact, we shall demonstrate that the text-dependent database of the 27-words can be used to simulate the testing of the scheme for an in text-dependent database without the need to record such a DB. Finally, we discussed ways of optimising the performance of our last scheme by considering possible ways of complementing our scheme using the addition of various image texture analysis features to be extracted from subwords, lines, paragraphs or entire file of the scabbed image. These included LBP and Gabor Filter. We also suggested the possible addition of few more features

    Invariant encoding schemes for visual recognition

    Get PDF
    Many encoding schemes, such as the Scale Invariant Feature Transform (SIFT) and Histograms of Oriented Gradients (HOG), make use of templates of histograms to enable a loose encoding of the spatial position of basic features such as oriented gradients. Whilst such schemes have been successfully applied, the use of a template may limit the potential as it forces the histograms to conform to a rigid spatial arrangement. In this work we look at developing novel schemes making use of histograms, without the need for a template, which offer good levels of performance in visual recognition tasks. To do this, we look at the way the basic feature type changes across scale at individual locations. This gives rise to the notion of column features, which capture this change across scale by concatenating feature types at a given scale separation. As well as applying this idea to oriented gradients, we make wide use of Basic Image Features (BIFs) and oriented Basic Image Features (oBIFs) which encode local symmetry information. This resulted in a range of encoding schemes. We then tested these schemes on problems of current interest in three application areas. First, the recognition of characters taken from natural images, where our system outperformed existing methods. For the second area we selected a texture problem, involving the discrimination of quartz grains using surface texture, where the system achieved near perfect performance on the first task, and a level of performance comparable to an expert human on the second. In the third area, writer identification, the system achieved a perfect score and outperformed other methods when tested using the Arabic handwriting dataset as part of the ICDAR 2011 Competition