23,421 research outputs found
Handwriting Recognition of Historical Documents with few labeled data
Historical documents present many challenges for offline handwriting
recognition systems, among them, the segmentation and labeling steps. Carefully
annotated textlines are needed to train an HTR system. In some scenarios,
transcripts are only available at the paragraph level with no text-line
information. In this work, we demonstrate how to train an HTR system with few
labeled data. Specifically, we train a deep convolutional recurrent neural
network (CRNN) system on only 10% of manually labeled text-line data from a
dataset and propose an incremental training procedure that covers the rest of
the data. Performance is further increased by augmenting the training set with
specially crafted multiscale data. We also propose a model-based normalization
scheme which considers the variability in the writing scale at the recognition
phase. We apply this approach to the publicly available READ dataset. Our
system achieved the second best result during the ICDAR2017 competition
Unconstrained Scene Text and Video Text Recognition for Arabic Script
Building robust recognizers for Arabic has always been challenging. We
demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid
architecture in recognizing Arabic text in videos and natural scenes. We
outperform previous state-of-the-art on two publicly available video text
datasets - ALIF and ACTIV. For the scene text recognition task, we introduce a
new Arabic scene text dataset and establish baseline results. For scripts like
Arabic, a major challenge in developing robust recognizers is the lack of large
quantity of annotated data. We overcome this by synthesising millions of Arabic
text images from a large vocabulary of Arabic words and phrases. Our
implementation is built on top of the model introduced here [37] which is
proven quite effective for English scene text recognition. The model follows a
segmentation-free, sequence to sequence transcription approach. The network
transcribes a sequence of convolutional features from the input image to a
sequence of target labels. This does away with the need for segmenting input
image into constituent characters/glyphs, which is often difficult for Arabic
script. Further, the ability of RNNs to model contextual dependencies yields
superior recognition results.Comment: 5 page
The Palaeographical Method under the Light of a Digital Approach
This paper has the twofold aim of reflecting upon a humanities computing approach to palaeography, and of making such reflections - together with its related experimental results - fruitful at the implementation level. Firstly, the paper explores the methodological issues related to the use of a digital tool to support the palaeographical analysis of medieval handwriting. It claims that humanities computing methods can assist in making explicit those processes of the palaeographical research that encompass detailed analyses, in particular of the handwriting and, more generally, of other idiosyncratic features of written cultural artefacts. Thus, palaeographical tools are to be contextualised and used within a broader methodological framework where their role is to mediate the vision, the comparison, the representation, the analysis and the interpretation of these objects. Secondly, the paper attempts to evaluate the experimentations carried out with a specific software and, in so doing, to test a humanities computing approach to palaeography at a practical level, so as to direct future implementations. Some of these implementations have already been carried out by the current developers of the application in question with whom the author collaborates closely, while others are still in progress and in need of future iterative refinements
Joint Learning of Correlated Sequence Labelling Tasks Using Bidirectional Recurrent Neural Networks
The stream of words produced by Automatic Speech Recognition (ASR) systems is
typically devoid of punctuations and formatting. Most natural language
processing applications expect segmented and well-formatted texts as input,
which is not available in ASR output. This paper proposes a novel technique of
jointly modeling multiple correlated tasks such as punctuation and
capitalization using bidirectional recurrent neural networks, which leads to
improved performance for each of these tasks. This method could be extended for
joint modeling of any other correlated sequence labeling tasks.Comment: Accepted in Interspeech 201
- …