2,903 research outputs found
Unconstrained Scene Text and Video Text Recognition for Arabic Script
Building robust recognizers for Arabic has always been challenging. We
demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid
architecture in recognizing Arabic text in videos and natural scenes. We
outperform previous state-of-the-art on two publicly available video text
datasets - ALIF and ACTIV. For the scene text recognition task, we introduce a
new Arabic scene text dataset and establish baseline results. For scripts like
Arabic, a major challenge in developing robust recognizers is the lack of large
quantity of annotated data. We overcome this by synthesising millions of Arabic
text images from a large vocabulary of Arabic words and phrases. Our
implementation is built on top of the model introduced here [37] which is
proven quite effective for English scene text recognition. The model follows a
segmentation-free, sequence to sequence transcription approach. The network
transcribes a sequence of convolutional features from the input image to a
sequence of target labels. This does away with the need for segmenting input
image into constituent characters/glyphs, which is often difficult for Arabic
script. Further, the ability of RNNs to model contextual dependencies yields
superior recognition results.Comment: 5 page
Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition
Recognizing irregular text in natural scene images is challenging due to the
large variance in text appearance, such as curvature, orientation and
distortion. Most existing approaches rely heavily on sophisticated model
designs and/or extra fine-grained annotations, which, to some extent, increase
the difficulty in algorithm implementation and data collection. In this work,
we propose an easy-to-implement strong baseline for irregular scene text
recognition, using off-the-shelf neural network components and only word-level
annotations. It is composed of a -layer ResNet, an LSTM-based
encoder-decoder framework and a 2-dimensional attention module. Despite its
simplicity, the proposed method is robust and achieves state-of-the-art
performance on both regular and irregular scene text recognition benchmarks.
Code is available at: https://tinyurl.com/ShowAttendReadComment: Accepted to Proc. AAAI Conference on Artificial Intelligence 201
- …