1,320 research outputs found

    Text-detection and -recognition from natural images

    Get PDF
    Text detection and recognition from images could have numerous functional applications for document analysis, such as assistance for visually impaired people; recognition of vehicle license plates; evaluation of articles containing tables, street signs, maps, and diagrams; keyword-based image exploration; document retrieval; recognition of parts within industrial automation; content-based extraction; object recognition; address block location; and text-based video indexing. This research exploited the advantages of artificial intelligence (AI) to detect and recognise text from natural images. Machine learning and deep learning were used to accomplish this task.In this research, we conducted an in-depth literature review on the current detection and recognition methods used by researchers to identify the existing challenges, wherein the differences in text resulting from disparity in alignment, style, size, and orientation combined with low image contrast and a complex background make automatic text extraction a considerably challenging and problematic task. Therefore, the state-of-the-art suggested approaches obtain low detection rates (often less than 80%) and recognition rates (often less than 60%). This has led to the development of new approaches. The aim of the study was to develop a robust text detection and recognition method from natural images with high accuracy and recall, which would be used as the target of the experiments. This method could detect all the text in the scene images, despite certain specific features associated with the text pattern. Furthermore, we aimed to find a solution to the two main problems concerning arbitrarily shaped text (horizontal, multi-oriented, and curved text) detection and recognition in a low-resolution scene and with various scales and of different sizes.In this research, we propose a methodology to handle the problem of text detection by using novel combination and selection features to deal with the classification algorithms of the text/non-text regions. The text-region candidates were extracted from the grey-scale images by using the MSER technique. A machine learning-based method was then applied to refine and validate the initial detection. The effectiveness of the features based on the aspect ratio, GLCM, LBP, and HOG descriptors was investigated. The text-region classifiers of MLP, SVM, and RF were trained using selections of these features and their combinations. The publicly available datasets ICDAR 2003 and ICDAR 2011 were used to evaluate the proposed method. This method achieved the state-of-the-art performance by using machine learning methodologies on both databases, and the improvements were significant in terms of Precision, Recall, and F-measure. The F-measure for ICDAR 2003 and ICDAR 2011 was 81% and 84%, respectively. The results showed that the use of a suitable feature combination and selection approach could significantly increase the accuracy of the algorithms.A new dataset has been proposed to fill the gap of character-level annotation and the availability of text in different orientations and of curved text. The proposed dataset was created particularly for deep learning methods which require a massive completed and varying range of training data. The proposed dataset includes 2,100 images annotated at the character and word levels to obtain 38,500 samples of English characters and 12,500 words. Furthermore, an augmentation tool has been proposed to support the proposed dataset. The missing of object detection augmentation tool encroach to proposed tool which has the ability to update the position of bounding boxes after applying transformations on images. This technique helps to increase the number of samples in the dataset and reduce the time of annotations where no annotation is required. The final part of the thesis presents a novel approach for text spotting, which is a new framework for an end-to-end character detection and recognition system designed using an improved SSD convolutional neural network, wherein layers are added to the SSD networks and the aspect ratio of the characters is considered because it is different from that of the other objects. Compared with the other methods considered, the proposed method could detect and recognise characters by training the end-to-end model completely. The performance of the proposed method was better on the proposed dataset; it was 90.34. Furthermore, the F-measure of the method’s accuracy on ICDAR 2015, ICDAR 2013, and SVT was 84.5, 91.9, and 54.8, respectively. On ICDAR13, the method achieved the second-best accuracy. The proposed method could spot text in arbitrarily shaped (horizontal, oriented, and curved) scene text.</div

    Some methods of encoding simple visual images for use with a sparse distributed memory, with applications to character recognition

    Get PDF
    To study the problems of encoding visual images for use with a Sparse Distributed Memory (SDM), I consider a specific class of images- those that consist of several pieces, each of which is a line segment or an arc of a circle. This class includes line drawings of characters such as letters of the alphabet. I give a method of representing a segment of an arc by five numbers in a continuous way; that is, similar arcs have similar representations. I also give methods for encoding these numbers as bit strings in an approximately continuous way. The set of possible segments and arcs may be viewed as a five-dimensional manifold M, whose structure is like a Mobious strip. An image, considered to be an unordered set of segments and arcs, is therefore represented by a set of points in M - one for each piece. I then discuss the problem of constructing a preprocessor to find the segments and arcs in these images, although a preprocessor has not been developed. I also describe a possible extension of the representation

    Extending functional databases for use in text-intensive applications

    Get PDF
    This thesis continues research exploring the benefits of using functional databases based around the functional data model for advanced database applications-particularly those supporting investigative systems. This is a growing generic application domain covering areas such as criminal and military intelligence, which are characterised by significant data complexity, large data sets and the need for high performance, interactive use. An experimental functional database language was developed to provide the requisite semantic richness. However, heavy use in a practical context has shown that language extensions and implementation improvements are required-especially in the crucial areas of string matching and graph traversal. In addition, an implementation on multiprocessor, parallel architectures is essential to meet the performance needs arising from existing and projected database sizes in the chosen application area. [Continues.

    Arbitrary Keyword Spotting in Handwritten Documents

    Get PDF
    Despite the existence of electronic media in today’s world, a considerable amount of written communications is in paper form such as books, bank cheques, contracts, etc. There is an increasing demand for the automation of information extraction, classification, search, and retrieval of documents. The goal of this research is to develop a complete methodology for the spotting of arbitrary keywords in handwritten document images. We propose a top-down approach to the spotting of keywords in document images. Our approach is composed of two major steps: segmentation and decision. In the former, we generate the word hypotheses. In the latter, we decide whether a generated word hypothesis is a specific keyword or not. We carry out the decision step through a two-level classification where first, we assign an input image to a keyword or non-keyword class; and then transcribe the image if it is passed as a keyword. By reducing the problem from the image domain to the text domain, we do not only address the search problem in handwritten documents, but also the classification and retrieval, without the need for the transcription of the whole document image. The main contribution of this thesis is the development of a generalized minimum edit distance for handwritten words, and to prove that this distance is equivalent to an Ergodic Hidden Markov Model (EHMM). To the best of our knowledge, this work is the first to present an exact 2D model for the temporal information in handwriting while satisfying practical constraints. Some other contributions of this research include: 1) removal of page margins based on corner detection in projection profiles; 2) removal of noise patterns in handwritten images using expectation maximization and fuzzy inference systems; 3) extraction of text lines based on fast Fourier-based steerable filtering; 4) segmentation of characters based on skeletal graphs; and 5) merging of broken characters based on graph partitioning. Our experiments with a benchmark database of handwritten English documents and a real-world collection of handwritten French documents indicate that, even without any word/document-level training, our results are comparable with two state-of-the-art word spotting systems for English and French documents

    Programming With Jack (Fourth Edition)

    Get PDF
    This manual describes the implementation of Jack™, with emphasis on how to extend it and modify it. The principle purpose of this manual is to describe what functions in the Jack libraries are available to be used in writing new features for Jack. The manual also gives an overview of how Jack works, for those interested in modifying its current behavior. This manual assumes that you already know how to use Jack, and are familiar with its basic terminology

    A STUDY ON GENERAL ASSEMBLY LINE BALANCING MODELING METHODS AND TECHNIQUES

    Get PDF
    The borders of the assembly line balancing problem, as classically drawn, are as clear as any other operations research topic in production planning, with well-defined sets of assumptions, parameters, and objective functions. In application, however, these borders are frequently transgressed. Many of these deviations are internal to the assembly line balancing problem itself, arising from any of a wide array of physical or technological features in modern assembly lines. Other issues are founded in the tight coupling of assembly line balancing with external production planning and management problems, as assembly lines are at the intersection of multiple related problems in job sequencing, part flow logistics, worker safety, and quality. The field of General Assembly Line Balancing is devoted to studying the class of adapted and extended solution techniques necessary in order to model these applied line balancing problems. In this dissertation a complex line balancing problem is presented based on the real production environment of our industrial partner, featuring several extensions for task-to-task relationships, station characteristics limiting assignment, and parallel worker zoning interactions. A constructive heuristic is developed along with two improvement heuristics, as well as an integer programming model for the same problem. An experiment is conducted testing each of these new solution methods upon a battery of testbed problems, measuring solution quality, runtime, and achievement of feasibility. Additionally, a new method for measuring a secondary horizontal line balancing objective is established, based on the options-mix paradigm rather than the customary model-mix paradigm

    Large-scale image collection cleansing, summarization and exploration

    Get PDF
    A perennially interesting topic in the research field of large scale image collection organization is how to effectively and efficiently conduct the tasks of image cleansing, summarization and exploration. The primary objective of such an image organization system is to enhance user exploration experience with redundancy removal and summarization operations on large-scale image collection. An ideal system is to discover and utilize the visual correlation among the images, to reduce the redundancy in large-scale image collection, to organize and visualize the structure of large-scale image collection, and to facilitate exploration and knowledge discovery. In this dissertation, a novel system is developed for exploiting and navigating large-scale image collection. Our system consists of the following key components: (a) junk image filtering by incorporating bilingual search results; (b) near duplicate image detection by using a coarse-to-fine framework; (c) concept network generation and visualization; (d) image collection summarization via dictionary learning for sparse representation; and (e) a multimedia practice of graffiti image retrieval and exploration. For junk image filtering, bilingual image search results, which are adopted for the same keyword-based query, are integrated to automatically identify the clusters for the junk images and the clusters for the relevant images. Within relevant image clusters, the results are further refined by removing the duplications under a coarse-to-fine structure. The duplicate pairs are detected with both global feature (partition based color histogram) and local feature (CPAM and SIFT Bag-of-Word model). The duplications are detected and removed from the data collection to facilitate further exploration and visual correlation analysis. After junk image filtering and duplication removal, the visual concepts are further organized and visualized by the proposed concept network. An automatic algorithm is developed to generate such visual concept network which characterizes the visual correlation between image concept pairs. Multiple kernels are combined and a kernel canonical correlation analysis algorithm is used to characterize the diverse visual similarity contexts between the image concepts. The FishEye visualization technique is implemented to facilitate the navigation of image concepts through our image concept network. To better assist the exploration of large scale data collection, we design an efficient summarization algorithm to extract representative examplars. For this collection summarization task, a sparse dictionary (a small set of the most representative images) is learned to represent all the images in the given set, e.g., such sparse dictionary is treated as the summary for the given image set. The simulated annealing algorithm is adopted to learn such sparse dictionary (image summary) by minimizing an explicit optimization function. In order to handle large scale image collection, we have evaluated both the accuracy performance of the proposed algorithms and their computation efficiency. For each of the above tasks, we have conducted experiments on multiple public available image collections, such as ImageNet, NUS-WIDE, LabelMe, etc. We have observed very promising results compared to existing frameworks. The computation performance is also satisfiable for large-scale image collection applications. The original intention to design such a large-scale image collection exploration and organization system is to better service the tasks of information retrieval and knowledge discovery. For this purpose, we utilize the proposed system to a graffiti retrieval and exploration application and receive positive feedback
    corecore