unknown

Text detection methods in images of natural scenes

Abstract

Text detection in natural scene images has gained much attention in the last years due to its enormous applicative potential in many areas such as content-based image retrieval, PDA signboard translators and applications for assisting blind and visually impaired people. A clear distinction, however, has to be made between text detection and text recognition. The task of the former is to locate text regions in an image, not to recognize them. Nevertheless, text detection and text recognition are closely related since the detected text regions can be subsequently fed into the text recognition modules. Due to diversity and complexity of natural scene images, the text detection task is considerably challenging. Text can appear at arbitrary image locations in arbitrary shapes, sizes and colors. Additionally, it is often subject to numerous geometric transformations. Finally, natural scene images contain very complex backgrounds, which make text detection even more difficult. In this dissertation, we present two novel methods: SWT voting-based color reduction and SWT direction determination. The first is a text detection-oriented segmentation method, that supervises the color reduction process by integrating additional SWT information. It improves segmentation accuracy compared to the other state-of-the-art methods. Colors rich with SWT pixels most likely belong to text and are therefore blocked from being mean-shifted away towards background colors. One of the disadvantages of the SWT method is the search direction problem. The method searches for parallel character edges in the gradient directions. In case of a dark text on a light background gradients correctly point towards character interiors, whereas in case of a light text on a dark background they point in the opposite directions and cause incorrect text detection. In order to solve the problem, the authors of the SWT method run the algorithm twice - in gradient and counter-gradient directions. Such approach, however, is imprecise and time consuming since the whole method has to be run twice. To avoid the search direction issue, we present a novel SWT direction determination method. By analyzing SWT sub-block histograms of both gradient and counter-gradient directions, the method is able to determine the correct SWT direction in one step. Usually, ICDAR 2003 and ICDAR 2011 datasets are used for text detection evaluation. Their disadvantage is rectangular annotation of single words in images, which requires, that the detected text is already grouped into words. Since text detection and word grouping are separate subjects, such annotation is problematic from the perspective of objective evaluation. Therefore, we created our own public annotated dataset of text in natural scene images CVL OCR DB. The dataset supports two types of annotation: n-polygon annotation and binary annotation. The latter allows per character evaluation and makes word grouping unnecessary. Experimental results on the CVL OCR DB dataset indicate that the SWT voting-based color reduction method outperforms the text-oriented color reduction method, which is used in the segmentation phase of the state-of-the-art text detection method of structure-based partition and grouping. Literature does not explicitly address SWT search direction issue; thus, the SWT direction determination method cannot be directly compared to the other methods. Nevertheless, the method achieves high detection rate on the CVL OCR DB dataset and is able to determine correct SWT directions when both dark text on light backgrounds and light text on dark backgrounds appear in the image. Generally speaking, the dissertation can also serve as a survey of text detection in natural scene images

    Similar works