1 research outputs found
Improving patch-based scene text script identification with ensembles of conjoined networks
This paper focuses on the problem of script identification in scene text
images. Facing this problem with state of the art CNN classifiers is not
straightforward, as they fail to address a key characteristic of scene text
instances: their extremely variable aspect ratio. Instead of resizing input
images to a fixed aspect ratio as in the typical use of holistic CNN
classifiers, we propose here a patch-based classification framework in order to
preserve discriminative parts of the image that are characteristic of its
class. We describe a novel method based on the use of ensembles of conjoined
networks to jointly learn discriminative stroke-parts representations and their
relative importance in a patch-based classification scheme. Our experiments
with this learning procedure demonstrate state-of-the-art results in two public
script identification datasets. In addition, we propose a new public benchmark
dataset for the evaluation of multi-lingual scene text end-to-end reading
systems. Experiments done in this dataset demonstrate the key role of script
identification in a complete end-to-end system that combines our script
identification method with a previously published text detector and an
off-the-shelf OCR engine