1,560 research outputs found

    Text Extraction From Natural Scene: Methodology And Application

    Full text link
    With the popularity of the Internet and the smart mobile device, there is an increasing demand for the techniques and applications of image/video-based analytics and information retrieval. Most of these applications can benefit from text information extraction in natural scene. However, scene text extraction is a challenging problem to be solved, due to cluttered background of natural scene and multiple patterns of scene text itself. To solve these problems, this dissertation proposes a framework of scene text extraction. Scene text extraction in our framework is divided into two components, detection and recognition. Scene text detection is to find out the regions containing text from camera captured images/videos. Text layout analysis based on gradient and color analysis is performed to extract candidates of text strings from cluttered background in natural scene. Then text structural analysis is performed to design effective text structural features for distinguishing text from non-text outliers among the candidates of text strings. Scene text recognition is to transform image-based text in detected regions into readable text codes. The most basic and significant step in text recognition is scene text character (STC) prediction, which is multi-class classification among a set of text character categories. We design robust and discriminative feature representations for STC structure, by integrating multiple feature descriptors, coding/pooling schemes, and learning models. Experimental results in benchmark datasets demonstrate the effectiveness and robustness of our proposed framework, which obtains better performance than previously published methods. Our proposed scene text extraction framework is applied to 4 scenarios, 1) reading print labels in grocery package for hand-held object recognition; 2) combining with car detection to localize license plate in camera captured natural scene image; 3) reading indicative signage for assistant navigation in indoor environments; and 4) combining with object tracking to perform scene text extraction in video-based natural scene. The proposed prototype systems and associated evaluation results show that our framework is able to solve the challenges in real applications

    Efficient Version-Space Reduction for Visual Tracking

    Full text link
    Discrminative trackers, employ a classification approach to separate the target from its background. To cope with variations of the target shape and appearance, the classifier is updated online with different samples of the target and the background. Sample selection, labeling and updating the classifier is prone to various sources of errors that drift the tracker. We introduce the use of an efficient version space shrinking strategy to reduce the labeling errors and enhance its sampling strategy by measuring the uncertainty of the tracker about the samples. The proposed tracker, utilize an ensemble of classifiers that represents different hypotheses about the target, diversify them using boosting to provide a larger and more consistent coverage of the version-space and tune the classifiers' weights in voting. The proposed system adjusts the model update rate by promoting the co-training of the short-memory ensemble with a long-memory oracle. The proposed tracker outperformed state-of-the-art trackers on different sequences bearing various tracking challenges.Comment: CRV'17 Conferenc

    Parking lot monitoring system using an autonomous quadrotor UAV

    Get PDF
    The main goal of this thesis is to develop a drone-based parking lot monitoring system using low-cost hardware and open-source software. Similar to wall-mounted surveillance cameras, a drone-based system can monitor parking lots without affecting the flow of traffic while also offering the mobility of patrol vehicles. The Parrot AR Drone 2.0 is the quadrotor drone used in this work due to its modularity and cost efficiency. Video and navigation data (including GPS) are communicated to a host computer using a Wi-Fi connection. The host computer analyzes navigation data using a custom flight control loop to determine control commands to be sent to the drone. A new license plate recognition pipeline is used to identify license plates of vehicles from video received from the drone

    A Review on Text Detection Techniques

    Get PDF
    Text detection in image is an important field. Reading text is challenging because of the variations in images. Text detection is useful for many navigational purposes e.g. text on google API’s and traffic panels etc. This paper analyzes the work done on text detection by many researchers and critically evaluates the techniques designed for text detection and states the limitation of each approach. We have integrated the work of many researchers for getting a brief over view of multiple available techniques and their strengths and limitations are also discussed to give readers a clear picture. The major dataset discussed in all these papers are ICDAR 2003, 2005, 2011, 2013 and SVT(street view text).

    Text Detection in Natural Scenes and Technical Diagrams with Convolutional Feature Learning and Cascaded Classification

    Get PDF
    An enormous amount of digital images are being generated and stored every day. Understanding text in these images is an important challenge with large impacts for academic, industrial and domestic applications. Recent studies address the difficulty of separating text targets from noise and background, all of which vary greatly in natural scenes. To tackle this problem, we develop a text detection system to analyze and utilize visual information in a data driven, automatic and intelligent way. The proposed method incorporates features learned from data, including patch-based coarse-to-fine detection (Text-Conv), connected component extraction using region growing, and graph-based word segmentation (Word-Graph). Text-Conv is a sliding window-based detector, with convolution masks learned using the Convolutional k-means algorithm (Coates et. al, 2011). Unlike convolutional neural networks (CNNs), a single vector/layer of convolution mask responses are used to classify patches. An initial coarse detection considers both local and neighboring patch responses, followed by refinement using varying aspect ratios and rotations for a smaller local detection window. Different levels of visual detail from ground truth are utilized in each step, first using constraints on bounding box intersections, and then a combination of bounding box and pixel intersections. Combining masks from different Convolutional k-means initializations, e.g., seeded using random vectors and then support vectors improves performance. The Word-Graph algorithm uses contextual information to improve word segmentation and prune false character detections based on visual features and spatial context. Our system obtains pixel, character, and word detection f-measures of 93.14%, 90.26%, and 86.77% respectively for the ICDAR 2015 Robust Reading Focused Scene Text dataset, out-performing state-of-the-art systems, and producing highly accurate text detection masks at the pixel level. To investigate the utility of our feature learning approach for other image types, we perform tests on 8- bit greyscale USPTO patent drawing diagram images. An ensemble of Ada-Boost classifiers with different convolutional features (MetaBoost) is used to classify patches as text or background. The Tesseract OCR system is used to recognize characters in detected labels and enhance performance. With appropriate pre-processing and post-processing, f-measures of 82% for part label location, and 73% for valid part label locations and strings are obtained, which are the best obtained to-date for the USPTO patent diagram data set used in our experiments. To sum up, an intelligent refinement of convolutional k-means-based feature learning and novel automatic classification methods are proposed for text detection, which obtain state-of-the-art results without the need for strong prior knowledge. Different ground truth representations along with features including edges, color, shape and spatial relationships are used coherently to improve accuracy. Different variations of feature learning are explored, e.g. support vector-seeded clustering and MetaBoost, with results suggesting that increased diversity in learned features benefit convolution-based text detectors
    • …
    corecore