696 research outputs found

    Partitioning of the Degradation Space for OCR Training

    Get PDF
    Generally speaking optical character recognition algorithms tend to perform better when presented with homogeneous data. This paper studies a method that is designed to increase the homogeneity of training data, based on an understanding of the types of degradations that occur during the printing and scanning process, and how these degradations affect the homogeneity of the data. While it has been shown that dividing the degradation space by edge spread improves recognition accuracy over dividing the degradation space by threshold or point spread function width alone, the challenge is in deciding how many partitions and at what value of edge spread the divisions should be made. Clustering of different types of character features, fonts, sizes, resolutions and noise levels shows that edge spread is indeed shown to be a strong indicator of the homogeneity of character data clusters

    Text Degradations and OCR Training

    Get PDF
    Printing and scanning of text documents introduces degradations to the characters which can be modeled. Interestingly, certain combinations of the parameters that govern the degradations introduced by the printing and scanning process affect characters in such a way that the degraded characters have a similar appearance, while other degradations leave the characters with an appearance that is very different. It is well known that (generally speaking) a test set that more closely matches a training set will be recognized with higher accuracy than one that matches the training set less well. Likewise, classifiers tend to perform better on data sets that have lower variance. This paper explores an analytical method that uses a formal printer/scanner degradation model to identify the similarity between groups of degraded characters. This similarity is shown to improve the recognition accuracy of a classifier through model directed choice of training set data

    Human Image Preference and Document Degradation Models

    Get PDF
    Because most degraded documents are created by people, the preferences individuals have in relation to degraded documents are quite important. Their preferences may determine whether or not the documents they created are appropriate for machines. The goal of this study was to find relationships between preference and several parameters of a scanner degradation model. It was found that the difference in binarization threshold and the difference in edge displacement caused by the degradation both had strong linear relationships to preference. The width of the point spread function did not show such a relationship. These relationships were counterintuitive because degraded characters with thicker stroke widths than the original were preferred to those that had stroke widths closer to the original character

    WordSup: Exploiting Word Annotations for Character based Text Detection

    Full text link
    Imagery texts are usually organized as a hierarchy of several visual elements, i.e. characters, words, text lines and text blocks. Among these elements, character is the most basic one for various languages such as Western, Chinese, Japanese, mathematical expression and etc. It is natural and convenient to construct a common text detection engine based on character detectors. However, training character detectors requires a vast of location annotated characters, which are expensive to obtain. Actually, the existing real text datasets are mostly annotated in word or line level. To remedy this dilemma, we propose a weakly supervised framework that can utilize word annotations, either in tight quadrangles or the more loose bounding boxes, for character detector training. When applied in scene text detection, we are thus able to train a robust character detector by exploiting word annotations in the rich large-scale real scene text datasets, e.g. ICDAR15 and COCO-text. The character detector acts as a key role in the pipeline of our text detection engine. It achieves the state-of-the-art performance on several challenging scene text detection benchmarks. We also demonstrate the flexibility of our pipeline by various scenarios, including deformed text detection and math expression recognition.Comment: 2017 International Conference on Computer Visio

    Predictive maintenance: a novel framework for a data-driven, semi-supervised, and partially online prognostic health management application in industries

    Get PDF
    Prognostic Health Management (PHM) is a predictive maintenance strategy, which is based on Condition Monitoring (CM) data and aims to predict the future states of machinery. The existing literature reports the PHM at two levels: methodological and applicative. From the methodological point of view, there are many publications and standards of a PHM system design. From the applicative point of view, many papers address the improvement of techniques adopted for realizing PHM tasks without covering the whole process. In these cases, most applications rely on a large amount of historical data to train models for diagnostic and prognostic purposes. Industries, very often, are not able to obtain these data. Thus, the most adopted approaches, based on batch and off-line analysis, cannot be adopted. In this paper, we present a novel framework and architecture that support the initial application of PHM from the machinery producers’ perspective. The proposed framework is based on an edge-cloud infrastructure that allows performing streaming analysis at the edge to reduce the quantity of the data to store in permanent memory, to know the health status of the machinery at any point in time, and to discover novel and anomalous behaviors. The collection of the data from multiple machines into a cloud server allows training more accurate diagnostic and prognostic models using a higher amount of data, whose results will serve to predict the health status in real-time at the edge. The so-built PHM system would allow industries to monitor and supervise a machinery network placed in different locations and can thus bring several benefits to both machinery producers and users. After a brief literature review of signal processing, feature extraction, diagnostics, and prognostics, including incremental and semi-supervised approaches for anomaly and novelty detection applied to data streams, a case study is presented. It was conducted on data collected from a test rig and shows the potential of the proposed framework in terms of the ability to detect changes in the operating conditions and abrupt faults and storage memory saving. The outcomes of our work, as well as its major novel aspect, is the design of a framework for a PHM system based on specific requirements that directly originate from the industrial field, together with indications on which techniques can be adopted to achieve such goals

    A Multiple-Expert Binarization Framework for Multispectral Images

    Full text link
    In this work, a multiple-expert binarization framework for multispectral images is proposed. The framework is based on a constrained subspace selection limited to the spectral bands combined with state-of-the-art gray-level binarization methods. The framework uses a binarization wrapper to enhance the performance of the gray-level binarization. Nonlinear preprocessing of the individual spectral bands is used to enhance the textual information. An evolutionary optimizer is considered to obtain the optimal and some suboptimal 3-band subspaces from which an ensemble of experts is then formed. The framework is applied to a ground truth multispectral dataset with promising results. In addition, a generalization to the cross-validation approach is developed that not only evaluates generalizability of the framework, it also provides a practical instance of the selected experts that could be then applied to unseen inputs despite the small size of the given ground truth dataset.Comment: 12 pages, 8 figures, 6 tables. Presented at ICDAR'1

    Degradation Specific OCR

    Get PDF
    Optical Character Recognition (OCR) is the mechanical or electronic translation of scanned images of handwritten, typewritten, or printed text into machine-encoded text. OCR has many applications, such as enabling a text document in a physical form to be editable, or enabling computer searching on a computer of a text that was initially in printed form. OCR engines are widely used to digitize text documents so that they can be digitally stored for remote access, mainly for websites. This facilitates the availability of these invaluable resources instantly, no matter the geographical location of the end user. Huge OCR misclassification errors can occur when an OCR engine is used to digitize a document that is degraded. The degradation may be due to varied reasons, including aging of the paper, incomplete printed characters, and blots of ink on the original document. In this thesis, the degradation due to scanning text documents was considered. To improve the OCR performance, it is vital to train the classifier on a large training set that has significant data points similar to the degraded real-life characters. In this thesis, characters with varying degrees of blurring and binarization thresholds were generated and they were used to calculate Edge Spread degradation parameters. These parameters were then used to divide the training data set of the OCR engine into more homogeneous sets. The resulting classification accuracy by training on these smaller sets was analyzed. The training data set consisted of 100,000 data points of 300 DPI, 12 point Sans Serif font lowercase characters ‘c and ‘e’. These characters were generated with random values of threshold and blur width with random Gaussian noise added. To group the similar degraded characters together, clustering was performed using the Isodata clustering algoirithm. The two edge-spread parameters, one calculated on isolated edges named DC, one calculated on edges in close proximity accounting for interference effects, named MDC, were estimated to fit the cluster boundaries. These values were then used to divide the training data and a Bayesian classifier was used for recognition. It was verified that MDC is slightly better than DC as a division parameter. A choice of either 2 or 3 partitions was found to be the best choice for dataset division. An experimental way to estimate the best boundary to divide the data set was determined and tests were conducted that verified it. Both crisp and fuzzy approaches for classifier training and testing were implemented and various combinations were tried with the crisp training and fuzzy testing being the best approach, giving a 98.08% classification rate for the data set divided into 2 partitions and 98.93% classification rate for the data set divided into 3 partitions in comparison to 94.08% for the classification of the data set with no divisions

    Character Recognition

    Get PDF
    Character recognition is one of the pattern recognition technologies that are most widely used in practical applications. This book presents recent advances that are relevant to character recognition, from technical topics such as image processing, feature extraction or classification, to new applications including human-computer interfaces. The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field
    • …
    corecore