1,215 research outputs found

    Novel Perspectives for the Management of Multilingual and Multialphabetic Heritages through Automatic Knowledge Extraction: The DigitalMaktaba Approach

    Get PDF
    The linguistic and social impact of multiculturalism can no longer be neglected in any sector, creating the urgent need of creating systems and procedures for managing and sharing cultural heritages in both supranational and multi-literate contexts. In order to achieve this goal, text sensing appears to be one of the most crucial research areas. The long-term objective of the DigitalMaktaba project, born from interdisciplinary collaboration between computer scientists, historians, librarians, engineers and linguists, is to establish procedures for the creation, management and cataloguing of archival heritage in non-Latin alphabets. In this paper, we discuss the currently ongoing design of an innovative workflow and tool in the area of text sensing, for the automatic extraction of knowledge and cataloguing of documents written in non-Latin languages (Arabic, Persian and Azerbaijani). The current prototype leverages different OCR, text processing and information extraction techniques in order to provide both a highly accurate extracted text and rich metadata content (including automatically identified cataloguing metadata), overcoming typical limitations of current state of the art approaches. The initial tests provide promising results. The paper includes a discussion of future steps (e.g., AI-based techniques further leveraging the extracted data/metadata and making the system learn from user feedback) and of the many foreseen advantages of this research, both from a technical and a broader cultural-preservation and sharing point of view

    Offline MODI Character Recognition Using Complex Moments

    Get PDF
    AbstractThe varying writing style and critical representation of characters in Indian script makes Handwritten Optical Character (HOCR) challenging and has attracted researchers to contribute in this domain. ‘MODI’ Script had cursive type of writings in Devanagari, Marathi where large amount of historical documents were available and need to be digitally explored. The principal objective of this research work is to describe efficiency of Zernike Complex moments and Zernike moments with different Zoning patterns for offline recognition of handwritten ‘MODI’ characters. Every character was divided in six zoning patterns with 37 zones. Geometrical shapes were used to create zoning patterns. The work was resulted in 94.92% correct recognition rate was achieved by using Zernike moments and 94.78% by using Zernike complex moments with integrated approach for heterogeneous zones

    Segmentation-free Word Spotting for Handwritten Arabic Documents

    Get PDF
    In this paper we present an unsupervised segmentation-free method for spotting and searching query, especially, for images documents in handwritten Arabic, for this, Histograms of Oriented Gradients (HOGs) are used as the feature vectors to represent the query and documents image. Then, we compress the descriptors with the product quantization method. Finally, a better representation of the query is obtained by using the Support Vector Machines (SVM)

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed
    • …
    corecore