2,925 research outputs found

    Adaptive restoration of text images containing touching and broken characters

    Full text link
    For document processing systems, automated data entry is generally performed by optical character recognition (OCR) systems. To make these systems practical, reliable OCR systems are essential. However, distortions in document images cause character recognition errors, thereby, reducing the accuracy of OCR systems. In document images, most OCR errors are caused by broken and touching characters. This thesis presents an adaptive system to restore text images distorted by touching and broken characters. The adaptive system uses the distorted text image and the output from an OCR system to generate the training character image. Using the training image and the distorted image, the system trains an adaptive restoration filter and then uses the trained filter to restore the distorted text image. To demonstrate the performance of this technique, it was applied to several distorted images containing touching or broken characters. The results show that this technique can improve both pixel and OCR accuracy of distorted text images containing touching or broken characters

    Use of neural networks to predict Ocr accuracy

    Full text link
    Use of Neural Networks to Predict OCR Accuracy investigates issues in developing an artificial neural network (ANN) based system for prediction of OCR accuracy from the image of a page. This work extends the work of Blando and Gonzalez in the following ways: enlarging training data, proposing new features, comparing different ANN architectures, and introducing a cross-validation learning algorithm; The following experiments were performed: comparison of 14 dimension feature metrics and 7 dimension feature metrics, comparison of an ANN trained with and without cross-validation, comparison of different neural network architectures, comparison of prediction capability of neural network and linear regression, comparison of the prediction capability of neural network using 14 dimension feature metrics and linear regression using reject markers. The results show that neural network can outperform linear regression if properly trained, and that the new feature metrics provide improved predictive ability

    Character-based Automated Human Perception Quality Assessment In Document Images

    Get PDF
    Large degradations in document images impede their readability and deteriorate the performance of automated document processing systems. Document image quality (IQ) metrics have been defined through optical character recognition (OCR) accuracy. Such metrics, however, do not always correlate with human perception of IQ. When enhancing document images with the goal of improving readability, e.g., in historical documents where OCR performance is low and/or where it is necessary to preserve the original context, it is important to understand human perception of quality. The goal of this paper is to design a system that enables the learning and estimation of human perception of document IQ. Such a metric can be used to compare existing document enhancement methods and guide automated document enhancement. Moreover, the proposed methodology is designed as a general framework that can be applied in a wide range of applications. © 2012 IEEE

    Historical Document Enhancement Using LUT Classification

    Get PDF
    The fast evolution of scanning and computing technologies in recent years has led to the creation of large collections of scanned historical documents. It is almost always the case that these scanned documents suffer from some form of degradation. Large degradations make documents hard to read and substantially deteriorate the performance of automated document processing systems. Enhancement of degraded document images is normally performed assuming global degradation models. When the degradation is large, global degradation models do not perform well. In contrast, we propose to learn local degradation models and use them in enhancing degraded document images. Using a semi-automated enhancement system, we have labeled a subset of the Frieder diaries collection (The diaries of Rabbi Dr. Avraham Abba Frieder. http://ir.iit.edu/collections/). This labeled subset was then used to train classifiers based on lookup tables in conjunction with the approximated nearest neighbor algorithm. The resulting algorithm is highly efficient and effective. Experimental evaluation results are provided using the Frieder diaries collection (The diaries of Rabbi Dr. Avraham Abba Frieder. http://ir.iit.edu/collections/). © Springer-Verlag 2009

    Preprocessing Techniques in Character Recognition

    Get PDF
    • …
    corecore