Search CORE

3 research outputs found

Scene Text Synthesis for Efficient and Effective Deep Network Training

Author: Lu Shijian
Zhan Fangneng
Zhu Hongyuan
Publication venue
Publication date: 26/01/2019
Field of study

A large amount of annotated training images is critical for training accurate and robust deep network models but the collection of a large amount of annotated training images is often time-consuming and costly. Image synthesis alleviates this constraint by generating annotated training images automatically by machines which has attracted increasing interest in the recent deep learning research. We develop an innovative image synthesis technique that composes annotated training images by realistically embedding foreground objects of interest (OOI) into background images. The proposed technique consists of two key components that in principle boost the usefulness of the synthesized images in deep network training. The first is context-aware semantic coherence which ensures that the OOI are placed around semantically coherent regions within the background image. The second is harmonious appearance adaptation which ensures that the embedded OOI are agreeable to the surrounding background from both geometry alignment and appearance realism. The proposed technique has been evaluated over two related but very different computer vision challenges, namely, scene text detection and scene text recognition. Experiments over a number of public datasets demonstrate the effectiveness of our proposed image synthesis technique - the use of our synthesized images in deep network training is capable of achieving similar or even better scene text detection and scene text recognition performance as compared with using real images.Comment: 8 pages, 5 figure

arXiv.org e-Print Archive

DOCUMENT TEXT DETECTION IN VIDEO FRAMES ACQUIRED BY A SMARTPHONE BASED ON LINE SEGMENT DETECTOR AND DBSCAN CLUSTERING

Author: ABDELKARIM ZATNI
HASSAN EL BAHI
Publication venue: Taylor's University
Publication date: 01/02/2018
Field of study

Automatic document text detection in video is an important task and a prerequisite for video retrieval, annotation, recognition, indexing and content analysis. In this paper, we present an effective and efficient model for detecting the page outlines within frames of video clip acquired by a Smartphone. The model consists of four stages: In the first stage, all line segments of each video frame are detected by LSD method. In the second stage, the line segments are grouped into clusters using the DBSCAN clustering algorithm, and then a prior knowledge is used in order to discover the cluster of page document from the background. In the third and fourth stages, a length and an angle filtering processes are performed respectively on the cluster of line segments. Finally a sorting operation is applied in order to detect the quadrilateral coordinates of the document page in the input video frame. The proposed model is evaluated on the ICDAR 2015 Smartphone Capture OCR dataset. Experimental results and comparative study show that our model can achieve encouraging and useful results and works efficiently even under different classes of documents

Directory of Open Access Journals

Multioriented video scene text detection through bayesian classification and boundary growing

Author: Lu S.
Phan T.Q.
Shivakumara P.
Sreedhar R.P.
Tan C.L.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

10.1109/TCSVT.2012.2198129IEEE Transactions on Circuits and Systems for Video Technology2281227-1235ITCT

Crossref

ScholarBank@NUS