265,494 research outputs found

    A fine-grained approach to scene text script identification

    Full text link
    This paper focuses on the problem of script identification in unconstrained scenarios. Script identification is an important prerequisite to recognition, and an indispensable condition for automatic text understanding systems designed for multi-language environments. Although widely studied for document images and handwritten documents, it remains an almost unexplored territory for scene text images. We detail a novel method for script identification in natural images that combines convolutional features and the Naive-Bayes Nearest Neighbor classifier. The proposed framework efficiently exploits the discriminative power of small stroke-parts, in a fine-grained classification framework. In addition, we propose a new public benchmark dataset for the evaluation of joint text detection and script identification in natural scenes. Experiments done in this new dataset demonstrate that the proposed method yields state of the art results, while it generalizes well to different datasets and variable number of scripts. The evidence provided shows that multi-lingual scene text recognition in the wild is a viable proposition. Source code of the proposed method is made available online

    Model-based identification of Oriental documents

    Get PDF
    Computers with the capability of identifying languages printed in documents can support many potential applications including document classification for character recognition, translation, and language understanding. Language identification is normally done manually. However, the high volume and variety of languages encountered make manual identification impractical and an automatic language approach becomes necessary. Therefore, language identification is a key step in the automatic processing of document images. This thesis is concerned with a model-based classification of Oriental documents into Chinese, Japanese, and Korean. A model-based approach locates an object, of which the computer has a model in an image. In this work, the objects to be located are some of the most frequently appearing characters in each of the three Oriental languages, and the images to be searched for the objects are the Oriental documents fed to the system. A major part of the work is to locate instances of the character models in an Oriental document. which is done by using the Hausdorff distance, a similarity measure defined between two sets of points. One of the point sets represents a model of some Oriental character to look for, and the other represents each character in the document image to be identified. Since Oriental documents are complex in structure, a portion of the text is extracted from the input document for further processin

    Recognition and Detection of Language on Inscriptions

    Full text link
    Ancient language Font Recognition is one of the Challenging tasks in Optical Character Recognition and Document Analysis. Most of the existing methods are for font recognition make use of local typographical features and connected component analysis. In this paper, Ancient language font recognition is done based on global texture analysis. Ancient language characters are different from currentnbsp centuryrsquos Ancient language character. This paper concentrates on the century identification of ancient language characters and converting them into current centuryrsquos form using MATLAB. Recognition of ancient language hand written characters from inscriptions is difficult. In this paper, a method for recognizing Ancient language characters from stone inscriptions, called the contour-let transform, which has been recently introduced, is adopted. From the previous research works, itrsquos noticed that Wavelet transforms are not capable of reconstructing curved images are perfectly. The contour-let transform offers a solution to remedy to this insufficiency. Contour-let transform is a 3D approach technique where as wavelet transform is a 2D technique. The characters from the input image are recognized through the clustering mechanism. Further the noise is present in the image is removed by fuzzy median filters. Neural networks are been employed to train the image and compare the data with the current centuryrsquos character. hence a more accurate recognition of Ancient language characters from stone inscriptions is obtained

    Manuscriptorium Digital Library and ENRICH Project: Means for Dealing with Digital Codicology and Palaeography

    Get PDF
    Codicology and palaeography in the digital age can be developed both through adapting existing methods and using information and communication technologies. This can be achieved e.g by projects focusing on the integration of individual resources under a single user interface. This is the aim of the Manuscriptorium digital library as well as the ENRICH project. The integration is based on the centralisation of metadata from various resources and on the distributed storage of data, mainly digital images. This is implemented through a distributed complex digital document, containing the so-called identification record and more data types. The construction of the integrated Manuscriptorium digital library within the ENRICH project is being done in four basic ways: automatically, or semi-automatically respectively manually, and those both online and offline. This has made it possible to amass more than 5,000 documents. For Manuscriptorium, a search is important, which allows information to be gathered through special fields and the differences in graphics to be harmonised. The aim of the ENRICH project is also the creation of tools for the compilation of virtual collections and documents. In its method of integrating resources, the Manuscriptorium endeavours to be an instrument of codicological and palaeographic research
    corecore