136 research outputs found
Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition
In this work we present a framework for the recognition of natural scene
text. Our framework does not require any human-labelled data, and performs word
recognition on the whole image holistically, departing from the character based
recognition systems of the past. The deep neural network models at the centre
of this framework are trained solely on data produced by a synthetic text
generation engine -- synthetic data that is highly realistic and sufficient to
replace real data, giving us infinite amounts of training data. This excess of
data exposes new possibilities for word recognition models, and here we
consider three models, each one "reading" words in a different way: via 90k-way
dictionary encoding, character sequence encoding, and bag-of-N-grams encoding.
In the scenarios of language based and completely unconstrained text
recognition we greatly improve upon state-of-the-art performance on standard
datasets, using our fast, simple machinery and requiring zero data-acquisition
costs
Smart Augmentation - Learning an Optimal Data Augmentation Strategy
A recurring problem faced when training neural networks is that there is
typically not enough data to maximize the generalization capability of deep
neural networks(DNN). There are many techniques to address this, including data
augmentation, dropout, and transfer learning. In this paper, we introduce an
additional method which we call Smart Augmentation and we show how to use it to
increase the accuracy and reduce overfitting on a target network. Smart
Augmentation works by creating a network that learns how to generate augmented
data during the training process of a target network in a way that reduces that
networks loss. This allows us to learn augmentations that minimize the error of
that network.
Smart Augmentation has shown the potential to increase accuracy by
demonstrably significant measures on all datasets tested. In addition, it has
shown potential to achieve similar or improved performance levels with
significantly smaller network sizes in a number of tested cases
Generating Text Sequence Images for Recognition
Recently, methods based on deep learning have dominated the field of text
recognition. With a large number of training data, most of them can achieve the
state-of-the-art performances. However, it is hard to harvest and label
sufficient text sequence images from the real scenes. To mitigate this issue,
several methods to synthesize text sequence images were proposed, yet they
usually need complicated preceding or follow-up steps. In this work, we present
a method which is able to generate infinite training data without any auxiliary
pre/post-process. We tackle the generation task as an image-to-image
translation one and utilize conditional adversarial networks to produce
realistic text sequence images in the light of the semantic ones. Some
evaluation metrics are involved to assess our method and the results
demonstrate that the caliber of the data is satisfactory. The code and dataset
will be publicly available soon
Visual Semantic Re-ranker for Text Spotting
Many current state-of-the-art methods for text recognition are based on
purely local information and ignore the semantic correlation between text and
its surrounding visual context. In this paper, we propose a post-processing
approach to improve the accuracy of text spotting by using the semantic
relation between the text and the scene. We initially rely on an off-the-shelf
deep neural network that provides a series of text hypotheses for each input
image. These text hypotheses are then re-ranked using the semantic relatedness
with the object in the image. As a result of this combination, the performance
of the original network is boosted with a very low computational cost. The
proposed framework can be used as a drop-in complement for any text-spotting
algorithm that outputs a ranking of word hypotheses. We validate our approach
on ICDAR'17 shared task dataset
Extracting textual overlays from social media videos using neural networks
Textual overlays are often used in social media videos as people who watch
them without the sound would otherwise miss essential information conveyed in
the audio stream. This is why extraction of those overlays can serve as an
important meta-data source, e.g. for content classification or retrieval tasks.
In this work, we present a robust method for extracting textual overlays from
videos that builds up on multiple neural network architectures. The proposed
solution relies on several processing steps: keyframe extraction, text
detection and text recognition. The main component of our system, i.e. the text
recognition module, is inspired by a convolutional recurrent neural network
architecture and we improve its performance using synthetically generated
dataset of over 600,000 images with text prepared by authors specifically for
this task. We also develop a filtering method that reduces the amount of
overlapping text phrases using Levenshtein distance and further boosts system's
performance. The final accuracy of our solution reaches over 80A% and is au
pair with state-of-the-art methods.Comment: International Conference on Computer Vision and Graphics (ICCVG) 201
A Review on Text Detection Techniques
Text detection in image is an important field. Reading text is challenging because of the variations in images. Text detection is useful for many navigational purposes e.g. text on google API’s and traffic panels etc. This paper analyzes the work done on text detection by many researchers and critically evaluates the techniques designed for text detection and states the limitation of each approach. We have integrated the work of many researchers for getting a brief over view of multiple available techniques and their strengths and limitations are also discussed to give readers a clear picture. The major dataset discussed in all these papers are ICDAR 2003, 2005, 2011, 2013 and SVT(street view text).
- …