17,427 research outputs found
Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition
In this work we present a framework for the recognition of natural scene
text. Our framework does not require any human-labelled data, and performs word
recognition on the whole image holistically, departing from the character based
recognition systems of the past. The deep neural network models at the centre
of this framework are trained solely on data produced by a synthetic text
generation engine -- synthetic data that is highly realistic and sufficient to
replace real data, giving us infinite amounts of training data. This excess of
data exposes new possibilities for word recognition models, and here we
consider three models, each one "reading" words in a different way: via 90k-way
dictionary encoding, character sequence encoding, and bag-of-N-grams encoding.
In the scenarios of language based and completely unconstrained text
recognition we greatly improve upon state-of-the-art performance on standard
datasets, using our fast, simple machinery and requiring zero data-acquisition
costs
Inductive Visual Localisation: Factorised Training for Superior Generalisation
End-to-end trained Recurrent Neural Networks (RNNs) have been successfully
applied to numerous problems that require processing sequences, such as image
captioning, machine translation, and text recognition. However, RNNs often
struggle to generalise to sequences longer than the ones encountered during
training. In this work, we propose to optimise neural networks explicitly for
induction. The idea is to first decompose the problem in a sequence of
inductive steps and then to explicitly train the RNN to reproduce such steps.
Generalisation is achieved as the RNN is not allowed to learn an arbitrary
internal state; instead, it is tasked with mimicking the evolution of a valid
state. In particular, the state is restricted to a spatial memory map that
tracks parts of the input image which have been accounted for in previous
steps. The RNN is trained for single inductive steps, where it produces updates
to the memory in addition to the desired output. We evaluate our method on two
different visual recognition problems involving visual sequences: (1) text
spotting, i.e. joint localisation and reading of text in images containing
multiple lines (or a block) of text, and (2) sequential counting of objects in
aerial images. We show that inductive training of recurrent models enhances
their generalisation ability on challenging image datasets.Comment: In BMVC 2018 (spotlight
STEFANN: Scene Text Editor using Font Adaptive Neural Network
Textual information in a captured scene plays an important role in scene
interpretation and decision making. Though there exist methods that can
successfully detect and interpret complex text regions present in a scene, to
the best of our knowledge, there is no significant prior work that aims to
modify the textual information in an image. The ability to edit text directly
on images has several advantages including error correction, text restoration
and image reusability. In this paper, we propose a method to modify text in an
image at character-level. We approach the problem in two stages. At first, the
unobserved character (target) is generated from an observed character (source)
being modified. We propose two different neural network architectures - (a)
FANnet to achieve structural consistency with source font and (b) Colornet to
preserve source color. Next, we replace the source character with the generated
character maintaining both geometric and visual consistency with neighboring
characters. Our method works as a unified platform for modifying text in
images. We present the effectiveness of our method on COCO-Text and ICDAR
datasets both qualitatively and quantitatively.Comment: Accepted in The IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 202
Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition
Recognizing irregular text in natural scene images is challenging due to the
large variance in text appearance, such as curvature, orientation and
distortion. Most existing approaches rely heavily on sophisticated model
designs and/or extra fine-grained annotations, which, to some extent, increase
the difficulty in algorithm implementation and data collection. In this work,
we propose an easy-to-implement strong baseline for irregular scene text
recognition, using off-the-shelf neural network components and only word-level
annotations. It is composed of a -layer ResNet, an LSTM-based
encoder-decoder framework and a 2-dimensional attention module. Despite its
simplicity, the proposed method is robust and achieves state-of-the-art
performance on both regular and irregular scene text recognition benchmarks.
Code is available at: https://tinyurl.com/ShowAttendReadComment: Accepted to Proc. AAAI Conference on Artificial Intelligence 201
- …