40 research outputs found
Rotation-invariant features for multi-oriented text detection in natural images.
Texts in natural scenes carry rich semantic information, which can be used to assist a wide range of applications, such as object recognition, image/video retrieval, mapping/navigation, and human computer interaction. However, most existing systems are designed to detect and recognize horizontal (or near-horizontal) texts. Due to the increasing popularity of mobile-computing devices and applications, detecting texts of varying orientations from natural images under less controlled conditions has become an important but challenging task. In this paper, we propose a new algorithm to detect texts of varying orientations. Our algorithm is based on a two-level classification scheme and two sets of features specially designed for capturing the intrinsic characteristics of texts. To better evaluate the proposed method and compare it with the competing algorithms, we generate a comprehensive dataset with various types of texts in diverse real-world scenes. We also propose a new evaluation protocol, which is more suitable for benchmarking algorithms for detecting texts in varying orientations. Experiments on benchmark datasets demonstrate that our system compares favorably with the state-of-the-art algorithms when handling horizontal texts and achieves significantly enhanced performance on variant texts in complex natural scenes
DOCUMENT TEXT DETECTION IN VIDEO FRAMES ACQUIRED BY A SMARTPHONE BASED ON LINE SEGMENT DETECTOR AND DBSCAN CLUSTERING
Automatic document text detection in video is an important task and a prerequisite for video retrieval, annotation, recognition, indexing and content analysis. In this paper, we present an effective and efficient model for detecting the page outlines within frames of video clip acquired by a Smartphone. The model consists of four stages: In the first stage, all line segments of each video frame are detected by LSD method. In the second stage, the line segments are grouped into clusters using the DBSCAN clustering algorithm, and then a prior knowledge is used in order to discover the cluster of page document from the background. In the third and fourth stages, a length and an angle filtering
processes are performed respectively on the cluster of line segments. Finally a sorting operation is applied in order to detect the quadrilateral coordinates of the document page in the input video frame. The proposed model is evaluated on the ICDAR 2015 Smartphone Capture OCR dataset. Experimental results
and comparative study show that our model can achieve encouraging and useful results and works efficiently even under different classes of documents
Text localization in images using reverse thresholds algorithm
High color similarity between text pixels and background pixels is the major problem that causes failure during text localization. In this paper, a novel algorithm, Reverse Thresholds (RT) algorithm is proposed to localize text from the images with various text-background color similarities. First, a rough calculation is proposed to determine the similarity index for every text region. Then, by applying reverse operation, the best thresholds for each text region are calculated by its similarity index. To remove other uncertainties, self-generated images with the same text features but different similarity index are used as experiment dataset. Experiment result shows that RT algorithm has higher localizing strength which is able to localize text in a wider range of similarity index
ATLAS: Adaptive Text Localization Algorithm in High Color Similarity Background
One of the major problems that occur in text localization process is the issue of color similarity between text and background image. The limitation of localization algorithms due to high color similarity is highlighted in several research papers. Hence, this research focuses towards the improvement of text localizing capability in high color background image similarity by introducing an adaptive text localization algorithm (ATLAS). ATLAS is an edge-based text localization algorithm that consists of two parts. Text-Background Similarity Index (TBSI) being the first part of ATLAS, measures the similarity index of every text region while the second, Multi Adaptive Threshold (MAT), performs multiple adaptive thresholds calculation using size filtration and degree deviation for locating the possible text region. In this research, ATLAS is verified and compared with other localization techniques based on two parameters, localizing strength and precision. The experiment has been implemented and verified using two types of datasets, generated text color spectrum dataset and Document Analysis and Recognition dataset (ICDAR). The result shows ATLAS has significant improvement on localizing strength and slight improvement on precision compared with other localization algorithms in high color text-background image
Artificial Intelligence in the Creative Industries: A Review
This paper reviews the current state of the art in Artificial Intelligence
(AI) technologies and applications in the context of the creative industries. A
brief background of AI, and specifically Machine Learning (ML) algorithms, is
provided including Convolutional Neural Network (CNNs), Generative Adversarial
Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement
Learning (DRL). We categorise creative applications into five groups related to
how AI technologies are used: i) content creation, ii) information analysis,
iii) content enhancement and post production workflows, iv) information
extraction and enhancement, and v) data compression. We critically examine the
successes and limitations of this rapidly advancing technology in each of these
areas. We further differentiate between the use of AI as a creative tool and
its potential as a creator in its own right. We foresee that, in the near
future, machine learning-based AI will be adopted widely as a tool or
collaborative assistant for creativity. In contrast, we observe that the
successes of machine learning in domains with fewer constraints, where AI is
the `creator', remain modest. The potential of AI (or its developers) to win
awards for its original creations in competition with human creatives is also
limited, based on contemporary technologies. We therefore conclude that, in the
context of creative industries, maximum benefit from AI will be derived where
its focus is human centric -- where it is designed to augment, rather than
replace, human creativity
A convolutional neural network based Chinese text detection algorithm via text structure modeling
Text detection in natural scene environment plays an important role in many computer vision applications. While existing text detection methods are focused on English characters, there is strong application demands on text detection in other languages, such as Chinese. As Chinese characters are much more complex than English characters, innovative and more efficient text detection techniques are required for Chinese texts. In this paper, we present a novel text detection algorithm for Chinese characters based on a specific designed convolutional neural network (CNN). The CNN model contains a text structure component detector layer, a spatial pyramid layer and a multi-input-layer deep belief network (DBN). The CNN is pretrained via a convolutional sparse auto-encoder (CSAE) in an unsupervised way, which is specifically designed for extracting complex features from Chinese characters. In particular, the text structure component detectors enhance the accuracy and uniqueness of feature descriptors by extracting multiple text structure components in various ways. The spatial pyramid layer is then introduced to enhance the scale invariability of the CNN model for detecting texts in multiple scales. Finally, the multi-input-layer DBN is used as the fully connected layers in the CNN model to ensure that features from multiple scales are comparable. A multilingual text detection dataset, in which texts in Chinese, English and digits are labeled separately, is set up to evaluate the proposed text detection algorithm. The proposed algorithm shows a significant 10% performance improvement over the baseline CNN algorithms. In addition the proposed algorithm is evaluated over a public multilingual image benchmark and achieves state-of-the-art results for text detection under multiple languages. Furthermore a simplified version of the proposed algorithm with only general components is compared to existing general text detection algorithms on the ICDAR 2011 and 2013 datasets, showing comparable detection performance to the existing algorithms
Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics
This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p