1,213 research outputs found
Cascaded Segmentation-Detection Networks for Word-Level Text Spotting
We introduce an algorithm for word-level text spotting that is able to
accurately and reliably determine the bounding regions of individual words of
text "in the wild". Our system is formed by the cascade of two convolutional
neural networks. The first network is fully convolutional and is in charge of
detecting areas containing text. This results in a very reliable but possibly
inaccurate segmentation of the input image. The second network (inspired by the
popular YOLO architecture) analyzes each segment produced in the first stage,
and predicts oriented rectangular regions containing individual words. No
post-processing (e.g. text line grouping) is necessary. With execution time of
450 ms for a 1000-by-560 image on a Titan X GPU, our system achieves the
highest score to date among published algorithms on the ICDAR 2015 Incidental
Scene Text dataset benchmark.Comment: 7 pages, 8 figure
Reading Scene Text in Deep Convolutional Sequences
We develop a Deep-Text Recurrent Network (DTRN) that regards scene text
reading as a sequence labelling problem. We leverage recent advances of deep
convolutional neural networks to generate an ordered high-level sequence from a
whole word image, avoiding the difficult character segmentation problem. Then a
deep recurrent model, building on long short-term memory (LSTM), is developed
to robustly recognize the generated CNN sequences, departing from most existing
approaches recognising each character independently. Our model has a number of
appealing properties in comparison to existing scene text recognition methods:
(i) It can recognise highly ambiguous words by leveraging meaningful context
information, allowing it to work reliably without either pre- or
post-processing; (ii) the deep CNN feature is robust to various image
distortions; (iii) it retains the explicit order information in word image,
which is essential to discriminate word strings; (iv) the model does not depend
on pre-defined dictionary, and it can process unknown words and arbitrary
strings. Codes for the DTRN will be available.Comment: To appear in the 13th AAAI Conference on Artificial Intelligence
(AAAI-16), 201
- …