11,053 research outputs found
Deep Structured Output Learning for Unconstrained Text Recognition
We develop a representation suitable for the unconstrained recognition of
words in natural images: the general case of no fixed lexicon and unknown
length.
To this end we propose a convolutional neural network (CNN) based
architecture which incorporates a Conditional Random Field (CRF) graphical
model, taking the whole word image as a single input. The unaries of the CRF
are provided by a CNN that predicts characters at each position of the output,
while higher order terms are provided by another CNN that detects the presence
of N-grams. We show that this entire model (CRF, character predictor, N-gram
predictor) can be jointly optimised by back-propagating the structured output
loss, essentially requiring the system to perform multi-task learning, and
training uses purely synthetically generated data. The resulting model is a
more accurate system on standard real-world text recognition benchmarks than
character prediction alone, setting a benchmark for systems that have not been
trained on a particular lexicon. In addition, our model achieves
state-of-the-art accuracy in lexicon-constrained scenarios, without being
specifically modelled for constrained recognition. To test the generalisation
of our model, we also perform experiments with random alpha-numeric strings to
evaluate the method when no visual language model is applicable.Comment: arXiv admin note: text overlap with arXiv:1406.222
Reading Scene Text in Deep Convolutional Sequences
We develop a Deep-Text Recurrent Network (DTRN) that regards scene text
reading as a sequence labelling problem. We leverage recent advances of deep
convolutional neural networks to generate an ordered high-level sequence from a
whole word image, avoiding the difficult character segmentation problem. Then a
deep recurrent model, building on long short-term memory (LSTM), is developed
to robustly recognize the generated CNN sequences, departing from most existing
approaches recognising each character independently. Our model has a number of
appealing properties in comparison to existing scene text recognition methods:
(i) It can recognise highly ambiguous words by leveraging meaningful context
information, allowing it to work reliably without either pre- or
post-processing; (ii) the deep CNN feature is robust to various image
distortions; (iii) it retains the explicit order information in word image,
which is essential to discriminate word strings; (iv) the model does not depend
on pre-defined dictionary, and it can process unknown words and arbitrary
strings. Codes for the DTRN will be available.Comment: To appear in the 13th AAAI Conference on Artificial Intelligence
(AAAI-16), 201
AON: Towards Arbitrarily-Oriented Text Recognition
Recognizing text from natural images is a hot research topic in computer
vision due to its various applications. Despite the enduring research of
several decades on optical character recognition (OCR), recognizing texts from
natural images is still a challenging task. This is because scene texts are
often in irregular (e.g. curved, arbitrarily-oriented or seriously distorted)
arrangements, which have not yet been well addressed in the literature.
Existing methods on text recognition mainly work with regular (horizontal and
frontal) texts and cannot be trivially generalized to handle irregular texts.
In this paper, we develop the arbitrary orientation network (AON) to directly
capture the deep features of irregular texts, which are combined into an
attention-based decoder to generate character sequence. The whole network can
be trained end-to-end by using only images and word-level annotations.
Extensive experiments on various benchmarks, including the CUTE80,
SVT-Perspective, IIIT5k, SVT and ICDAR datasets, show that the proposed
AON-based method achieves the-state-of-the-art performance in irregular
datasets, and is comparable to major existing methods in regular datasets.Comment: Accepted by CVPR201
- …