23,421 research outputs found

    Handwriting Recognition of Historical Documents with few labeled data

    Full text link
    Historical documents present many challenges for offline handwriting recognition systems, among them, the segmentation and labeling steps. Carefully annotated textlines are needed to train an HTR system. In some scenarios, transcripts are only available at the paragraph level with no text-line information. In this work, we demonstrate how to train an HTR system with few labeled data. Specifically, we train a deep convolutional recurrent neural network (CRNN) system on only 10% of manually labeled text-line data from a dataset and propose an incremental training procedure that covers the rest of the data. Performance is further increased by augmenting the training set with specially crafted multiscale data. We also propose a model-based normalization scheme which considers the variability in the writing scale at the recognition phase. We apply this approach to the publicly available READ dataset. Our system achieved the second best result during the ICDAR2017 competition

    Unconstrained Scene Text and Video Text Recognition for Arabic Script

    Full text link
    Building robust recognizers for Arabic has always been challenging. We demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid architecture in recognizing Arabic text in videos and natural scenes. We outperform previous state-of-the-art on two publicly available video text datasets - ALIF and ACTIV. For the scene text recognition task, we introduce a new Arabic scene text dataset and establish baseline results. For scripts like Arabic, a major challenge in developing robust recognizers is the lack of large quantity of annotated data. We overcome this by synthesising millions of Arabic text images from a large vocabulary of Arabic words and phrases. Our implementation is built on top of the model introduced here [37] which is proven quite effective for English scene text recognition. The model follows a segmentation-free, sequence to sequence transcription approach. The network transcribes a sequence of convolutional features from the input image to a sequence of target labels. This does away with the need for segmenting input image into constituent characters/glyphs, which is often difficult for Arabic script. Further, the ability of RNNs to model contextual dependencies yields superior recognition results.Comment: 5 page

    The Palaeographical Method under the Light of a Digital Approach

    Get PDF
    This paper has the twofold aim of reflecting upon a humanities computing approach to palaeography, and of making such reflections - together with its related experimental results - fruitful at the implementation level. Firstly, the paper explores the methodological issues related to the use of a digital tool to support the palaeographical analysis of medieval handwriting. It claims that humanities computing methods can assist in making explicit those processes of the palaeographical research that encompass detailed analyses, in particular of the handwriting and, more generally, of other idiosyncratic features of written cultural artefacts. Thus, palaeographical tools are to be contextualised and used within a broader methodological framework where their role is to mediate the vision, the comparison, the representation, the analysis and the interpretation of these objects. Secondly, the paper attempts to evaluate the experimentations carried out with a specific software and, in so doing, to test a humanities computing approach to palaeography at a practical level, so as to direct future implementations. Some of these implementations have already been carried out by the current developers of the application in question with whom the author collaborates closely, while others are still in progress and in need of future iterative refinements

    Joint Learning of Correlated Sequence Labelling Tasks Using Bidirectional Recurrent Neural Networks

    Full text link
    The stream of words produced by Automatic Speech Recognition (ASR) systems is typically devoid of punctuations and formatting. Most natural language processing applications expect segmented and well-formatted texts as input, which is not available in ASR output. This paper proposes a novel technique of jointly modeling multiple correlated tasks such as punctuation and capitalization using bidirectional recurrent neural networks, which leads to improved performance for each of these tasks. This method could be extended for joint modeling of any other correlated sequence labeling tasks.Comment: Accepted in Interspeech 201
    corecore