3 research outputs found
Scene Text Synthesis for Efficient and Effective Deep Network Training
A large amount of annotated training images is critical for training accurate
and robust deep network models but the collection of a large amount of
annotated training images is often time-consuming and costly. Image synthesis
alleviates this constraint by generating annotated training images
automatically by machines which has attracted increasing interest in the recent
deep learning research. We develop an innovative image synthesis technique that
composes annotated training images by realistically embedding foreground
objects of interest (OOI) into background images. The proposed technique
consists of two key components that in principle boost the usefulness of the
synthesized images in deep network training. The first is context-aware
semantic coherence which ensures that the OOI are placed around semantically
coherent regions within the background image. The second is harmonious
appearance adaptation which ensures that the embedded OOI are agreeable to the
surrounding background from both geometry alignment and appearance realism. The
proposed technique has been evaluated over two related but very different
computer vision challenges, namely, scene text detection and scene text
recognition. Experiments over a number of public datasets demonstrate the
effectiveness of our proposed image synthesis technique - the use of our
synthesized images in deep network training is capable of achieving similar or
even better scene text detection and scene text recognition performance as
compared with using real images.Comment: 8 pages, 5 figure
DOCUMENT TEXT DETECTION IN VIDEO FRAMES ACQUIRED BY A SMARTPHONE BASED ON LINE SEGMENT DETECTOR AND DBSCAN CLUSTERING
Automatic document text detection in video is an important task and a prerequisite for video retrieval, annotation, recognition, indexing and content analysis. In this paper, we present an effective and efficient model for detecting the page outlines within frames of video clip acquired by a Smartphone. The model consists of four stages: In the first stage, all line segments of each video frame are detected by LSD method. In the second stage, the line segments are grouped into clusters using the DBSCAN clustering algorithm, and then a prior knowledge is used in order to discover the cluster of page document from the background. In the third and fourth stages, a length and an angle filtering
processes are performed respectively on the cluster of line segments. Finally a sorting operation is applied in order to detect the quadrilateral coordinates of the document page in the input video frame. The proposed model is evaluated on the ICDAR 2015 Smartphone Capture OCR dataset. Experimental results
and comparative study show that our model can achieve encouraging and useful results and works efficiently even under different classes of documents
Multioriented video scene text detection through bayesian classification and boundary growing
10.1109/TCSVT.2012.2198129IEEE Transactions on Circuits and Systems for Video Technology2281227-1235ITCT