606 research outputs found

    Towards End-to-end Car License Plate Location and Recognition in Unconstrained Scenarios

    Full text link
    Benefiting from the rapid development of convolutional neural networks, the performance of car license plate detection and recognition has been largely improved. Nonetheless, challenges still exist especially for real-world applications. In this paper, we present an efficient and accurate framework to solve the license plate detection and recognition tasks simultaneously. It is a lightweight and unified deep neural network, that can be optimized end-to-end and work in real-time. Specifically, for unconstrained scenarios, an anchor-free method is adopted to efficiently detect the bounding box and four corners of a license plate, which are used to extract and rectify the target region features. Then, a novel convolutional neural network branch is designed to further extract features of characters without segmentation. Finally, recognition task is treated as sequence labelling problems, which are solved by Connectionist Temporal Classification (CTC) directly. Several public datasets including images collected from different scenarios under various conditions are chosen for evaluation. A large number of experiments indicate that the proposed method significantly outperforms the previous state-of-the-art methods in both speed and precision

    Text localization and recognition in natural scene images

    Get PDF
    Text localization and recognition (text spotting) in natural scene images is an interesting task that finds many practical applications. Algorithms for text spotting may be used in helping visually impaired subjects during navigation in unknown environments; building autonomous driving systems that automatically avoid collisions with pedestrians or automatically identify speed limits and warn the driver about possible infractions that are being committed; and to ease or solve some tedious and repetitive data entry tasks that are still manually carried out by humans. While Optical Character Recognition (OCR) from scanned documents is a solved problem, the same cannot be said for text spotting in natural images. In fact, this latest class of images contains plenty of difficult situations that algorithms for text spotting need to deal with in order to reach acceptable recognition rates. During my PhD research I focused my studies on the development of novel systems for text localization and recognition in natural scene images. The two main works that I have presented during these three years of PhD studies are presented in this thesis: (i) in my first work I propose a hybrid system which exploits the key ideas of region-based and connected components (CC)-based text localization approaches to localize uncommon fonts and writings in natural images; (ii) in my second work I describe a novel deep-based system which exploits Convolutional Neural Networks and enhanced stable CC to achieve good text spotting results on challenging data sets. During the development of both these methods, my focus has always been on maintaining an acceptable computational complexity and a high reproducibility of the achieved results

    Text localization and recognition in natural scene images

    Get PDF
    Text localization and recognition (text spotting) in natural scene images is an interesting task that finds many practical applications. Algorithms for text spotting may be used in helping visually impaired subjects during navigation in unknown environments; building autonomous driving systems that automatically avoid collisions with pedestrians or automatically identify speed limits and warn the driver about possible infractions that are being committed; and to ease or solve some tedious and repetitive data entry tasks that are still manually carried out by humans. While Optical Character Recognition (OCR) from scanned documents is a solved problem, the same cannot be said for text spotting in natural images. In fact, this latest class of images contains plenty of difficult situations that algorithms for text spotting need to deal with in order to reach acceptable recognition rates. During my PhD research I focused my studies on the development of novel systems for text localization and recognition in natural scene images. The two main works that I have presented during these three years of PhD studies are presented in this thesis: (i) in my first work I propose a hybrid system which exploits the key ideas of region-based and connected components (CC)-based text localization approaches to localize uncommon fonts and writings in natural images; (ii) in my second work I describe a novel deep-based system which exploits Convolutional Neural Networks and enhanced stable CC to achieve good text spotting results on challenging data sets. During the development of both these methods, my focus has always been on maintaining an acceptable computational complexity and a high reproducibility of the achieved results

    Text detection and recognition in natural scene images

    Get PDF
    This thesis addresses the problem of end-to-end text detection and recognition in natural scene images based on deep neural networks. Scene text detection and recognition aim to find regions in an image that are considered as text by human beings, generate a bounding box for each word and output a corresponding sequence of characters. As a useful task in image analysis, scene text detection and recognition attract much attention in computer vision field. In this thesis, we tackle this problem by taking advantage of the success in deep learning techniques. Car license plates can be viewed as a spacial case of scene text, as they both consist of characters and appear in natural scenes. Nevertheless, they have their respective specificities. During the research progress, we start from car license plate detection and recognition. Then we extend the methods to general scene text, with additional ideas proposed. For both tasks, we develop two approaches respectively: a stepwise one and an integrated one. Stepwise methods tackle text detection and recognition step by step by respective models; while integrated methods handle both text detection and recognition simultaneously via one model. All approaches are based on the powerful deep Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), considering the tremendous breakthroughs they brought into the computer vision community. To begin with, a stepwise framework is proposed to tackle text detection and recognition, with its application to car license plates and general scene text respectively. A character CNN classifier is well trained to detect characters from an image in a sliding window manner. The detected characters are then grouped together as license plates or text lines according to some heuristic rules. A sequence labeling based method is proposed to recognize the whole license plate or text line without character level segmentation. On the basis of the sequence labeling based recognition method, to accelerate the processing speed, an integrated deep neural network is then proposed to address car license plate detection and recognition concurrently. It integrates both CNNs and RNNs in one network, and can be trained end-to-end. Both car license plate bounding boxes and their labels are generated in a single forward evaluation of the network. The whole process involves no heuristic rule, and avoids intermediate procedures like image cropping or feature recalculation, which not only prevents error accumulation, but also reduces computation burden. Lastly, the unified network is extended to simultaneous general text detection and recognition in natural scene. In contrast to the one for car license plates, some innovations are proposed to accommodate the special characteristics of general text. A varying-size RoI encoding method is proposed to handle the various aspect ratios of general text. An attention-based sequence-to-sequence learning structure is adopted for word recognition. It is expected that a character-level language model can be learnt in this manner. The whole framework can be trained end-to-end, requiring only images, the ground-truth bounding boxes and text labels. Through end-to-end training, the learned features can be more discriminative, which improves the overall performance. The convolutional features are calculated only once and shared by both detection and recognition, which saves the processing time. The proposed method has achieved state-of-the-art performance on several standard benchmark datasets.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 201

    Vehicle license plate detection and recognition

    Get PDF
    "December 2013.""A Thesis presented to the Faculty of the Graduate School at the University of Missouri In Partial Fulfillment of the Requirements for the Degree Master of Science."Thesis supervisor: Dr. Zhihai He.In this work, we develop a license plate detection method using a SVM (Support Vector Machine) classifier with HOG (Histogram of Oriented Gradients) features. The system performs window searching at different scales and analyzes the HOG feature using a SVM and locates their bounding boxes using a Mean Shift method. Edge information is used to accelerate the time consuming scanning process. Our license plate detection results show that this method is relatively insensitive to variations in illumination, license plate patterns, camera perspective and background variations. We tested our method on 200 real life images, captured on Chinese highways under different weather conditions and lighting conditions. And we achieved a detection rate of 100%. After detecting license plates, alignment is then performed on the plate candidates. Conceptually, this alignment method searches neighbors of the bounding box detected, and finds the optimum edge position where the outside regions are very different from the inside regions of the license plate, from color's perspective in RGB space. This method accurately aligns the bounding box to the edges of the plate so that the subsequent license plate segmentation and recognition can be performed accurately and reliably. The system performs license plate segmentation using global alignment on the binary license plate. A global model depending on the layout of license plates is proposed to segment the plates. This model searches for the optimum position where the characters are all segmented but not chopped into pieces. At last, the characters are recognized by another SVM classifier, with a feature size of 576, including raw features, vertical and horizontal scanning features. Our character recognition results show that 99% of the digits are successfully recognized, while the letters achieve an recognition rate of 95%. The license plate recognition system was then incorporated into an embedded system for parallel computing. Several TS7250 and an auxiliary board are used to simulIncludes bibliographical references (pages 67-73)

    Video content analysis for intelligent forensics

    Get PDF
    The networks of surveillance cameras installed in public places and private territories continuously record video data with the aim of detecting and preventing unlawful activities. This enhances the importance of video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis. In this thesis, the primary focus is on four key aspects of video content analysis, namely; 1. Moving object detection and recognition, 2. Correction of colours in the video frames and recognition of colours of moving objects, 3. Make and model recognition of vehicles and identification of their type, 4. Detection and recognition of text information in outdoor scenes. To address the first issue, a framework is presented in the first part of the thesis that efficiently detects and recognizes moving objects in videos. The framework targets the problem of object detection in the presence of complex background. The object detection part of the framework relies on background modelling technique and a novel post processing step where the contours of the foreground regions (i.e. moving object) are refined by the classification of edge segments as belonging either to the background or to the foreground region. Further, a novel feature descriptor is devised for the classification of moving objects into humans, vehicles and background. The proposed feature descriptor captures the texture information present in the silhouette of foreground objects. To address the second issue, a framework for the correction and recognition of true colours of objects in videos is presented with novel noise reduction, colour enhancement and colour recognition stages. The colour recognition stage makes use of temporal information to reliably recognize the true colours of moving objects in multiple frames. The proposed framework is specifically designed to perform robustly on videos that have poor quality because of surrounding illumination, camera sensor imperfection and artefacts due to high compression. In the third part of the thesis, a framework for vehicle make and model recognition and type identification is presented. As a part of this work, a novel feature representation technique for distinctive representation of vehicle images has emerged. The feature representation technique uses dense feature description and mid-level feature encoding scheme to capture the texture in the frontal view of the vehicles. The proposed method is insensitive to minor in-plane rotation and skew within the image. The capability of the proposed framework can be enhanced to any number of vehicle classes without re-training. Another important contribution of this work is the publication of a comprehensive up to date dataset of vehicle images to support future research in this domain. The problem of text detection and recognition in images is addressed in the last part of the thesis. A novel technique is proposed that exploits the colour information in the image for the identification of text regions. Apart from detection, the colour information is also used to segment characters from the words. The recognition of identified characters is performed using shape features and supervised learning. Finally, a lexicon based alignment procedure is adopted to finalize the recognition of strings present in word images. Extensive experiments have been conducted on benchmark datasets to analyse the performance of proposed algorithms. The results show that the proposed moving object detection and recognition technique superseded well-know baseline techniques. The proposed framework for the correction and recognition of object colours in video frames achieved all the aforementioned goals. The performance analysis of the vehicle make and model recognition framework on multiple datasets has shown the strength and reliability of the technique when used within various scenarios. Finally, the experimental results for the text detection and recognition framework on benchmark datasets have revealed the potential of the proposed scheme for accurate detection and recognition of text in the wild

    Text-detection and -recognition from natural images

    Get PDF
    Text detection and recognition from images could have numerous functional applications for document analysis, such as assistance for visually impaired people; recognition of vehicle license plates; evaluation of articles containing tables, street signs, maps, and diagrams; keyword-based image exploration; document retrieval; recognition of parts within industrial automation; content-based extraction; object recognition; address block location; and text-based video indexing. This research exploited the advantages of artificial intelligence (AI) to detect and recognise text from natural images. Machine learning and deep learning were used to accomplish this task.In this research, we conducted an in-depth literature review on the current detection and recognition methods used by researchers to identify the existing challenges, wherein the differences in text resulting from disparity in alignment, style, size, and orientation combined with low image contrast and a complex background make automatic text extraction a considerably challenging and problematic task. Therefore, the state-of-the-art suggested approaches obtain low detection rates (often less than 80%) and recognition rates (often less than 60%). This has led to the development of new approaches. The aim of the study was to develop a robust text detection and recognition method from natural images with high accuracy and recall, which would be used as the target of the experiments. This method could detect all the text in the scene images, despite certain specific features associated with the text pattern. Furthermore, we aimed to find a solution to the two main problems concerning arbitrarily shaped text (horizontal, multi-oriented, and curved text) detection and recognition in a low-resolution scene and with various scales and of different sizes.In this research, we propose a methodology to handle the problem of text detection by using novel combination and selection features to deal with the classification algorithms of the text/non-text regions. The text-region candidates were extracted from the grey-scale images by using the MSER technique. A machine learning-based method was then applied to refine and validate the initial detection. The effectiveness of the features based on the aspect ratio, GLCM, LBP, and HOG descriptors was investigated. The text-region classifiers of MLP, SVM, and RF were trained using selections of these features and their combinations. The publicly available datasets ICDAR 2003 and ICDAR 2011 were used to evaluate the proposed method. This method achieved the state-of-the-art performance by using machine learning methodologies on both databases, and the improvements were significant in terms of Precision, Recall, and F-measure. The F-measure for ICDAR 2003 and ICDAR 2011 was 81% and 84%, respectively. The results showed that the use of a suitable feature combination and selection approach could significantly increase the accuracy of the algorithms.A new dataset has been proposed to fill the gap of character-level annotation and the availability of text in different orientations and of curved text. The proposed dataset was created particularly for deep learning methods which require a massive completed and varying range of training data. The proposed dataset includes 2,100 images annotated at the character and word levels to obtain 38,500 samples of English characters and 12,500 words. Furthermore, an augmentation tool has been proposed to support the proposed dataset. The missing of object detection augmentation tool encroach to proposed tool which has the ability to update the position of bounding boxes after applying transformations on images. This technique helps to increase the number of samples in the dataset and reduce the time of annotations where no annotation is required. The final part of the thesis presents a novel approach for text spotting, which is a new framework for an end-to-end character detection and recognition system designed using an improved SSD convolutional neural network, wherein layers are added to the SSD networks and the aspect ratio of the characters is considered because it is different from that of the other objects. Compared with the other methods considered, the proposed method could detect and recognise characters by training the end-to-end model completely. The performance of the proposed method was better on the proposed dataset; it was 90.34. Furthermore, the F-measure of the method’s accuracy on ICDAR 2015, ICDAR 2013, and SVT was 84.5, 91.9, and 54.8, respectively. On ICDAR13, the method achieved the second-best accuracy. The proposed method could spot text in arbitrarily shaped (horizontal, oriented, and curved) scene text.</div

    Text Extraction From Natural Scene: Methodology And Application

    Full text link
    With the popularity of the Internet and the smart mobile device, there is an increasing demand for the techniques and applications of image/video-based analytics and information retrieval. Most of these applications can benefit from text information extraction in natural scene. However, scene text extraction is a challenging problem to be solved, due to cluttered background of natural scene and multiple patterns of scene text itself. To solve these problems, this dissertation proposes a framework of scene text extraction. Scene text extraction in our framework is divided into two components, detection and recognition. Scene text detection is to find out the regions containing text from camera captured images/videos. Text layout analysis based on gradient and color analysis is performed to extract candidates of text strings from cluttered background in natural scene. Then text structural analysis is performed to design effective text structural features for distinguishing text from non-text outliers among the candidates of text strings. Scene text recognition is to transform image-based text in detected regions into readable text codes. The most basic and significant step in text recognition is scene text character (STC) prediction, which is multi-class classification among a set of text character categories. We design robust and discriminative feature representations for STC structure, by integrating multiple feature descriptors, coding/pooling schemes, and learning models. Experimental results in benchmark datasets demonstrate the effectiveness and robustness of our proposed framework, which obtains better performance than previously published methods. Our proposed scene text extraction framework is applied to 4 scenarios, 1) reading print labels in grocery package for hand-held object recognition; 2) combining with car detection to localize license plate in camera captured natural scene image; 3) reading indicative signage for assistant navigation in indoor environments; and 4) combining with object tracking to perform scene text extraction in video-based natural scene. The proposed prototype systems and associated evaluation results show that our framework is able to solve the challenges in real applications

    Development of a text reading system on video images

    Get PDF
    Since the early days of computer science researchers sought to devise a machine which could automatically read text to help people with visual impairments. The problem of extracting and recognising text on document images has been largely resolved, but reading text from images of natural scenes remains a challenge. Scene text can present uneven lighting, complex backgrounds or perspective and lens distortion; it usually appears as short sentences or isolated words and shows a very diverse set of typefaces. However, video sequences of natural scenes provide a temporal redundancy that can be exploited to compensate for some of these deficiencies. Here we present a complete end-to-end, real-time scene text reading system on video images based on perspective aware text tracking. The main contribution of this work is a system that automatically detects, recognises and tracks text in videos of natural scenes in real-time. The focus of our method is on large text found in outdoor environments, such as shop signs, street names and billboards. We introduce novel efficient techniques for text detection, text aggregation and text perspective estimation. Furthermore, we propose using a set of Unscented Kalman Filters (UKF) to maintain each text regionÂżs identity and to continuously track the homography transformation of the text into a fronto-parallel view, thereby being resilient to erratic camera motion and wide baseline changes in orientation. The orientation of each text line is estimated using a method that relies on the geometry of the characters themselves to estimate a rectifying homography. This is done irrespective of the view of the text over a large range of orientations. We also demonstrate a wearable head-mounted device for text reading that encases a camera for image acquisition and a pair of headphones for synthesized speech output. Our system is designed for continuous and unsupervised operation over long periods of time. It is completely automatic and features quick failure recovery and interactive text reading. It is also highly parallelised in order to maximize the usage of available processing power and to achieve real-time operation. We show comparative results that improve the current state-of-the-art when correcting perspective deformation of scene text. The end-to-end system performance is demonstrated on sequences recorded in outdoor scenarios. Finally, we also release a dataset of text tracking videos along with the annotated ground-truth of text regions
    • …
    corecore