    The NoisyOffice Database: A Corpus To Train Supervised Machine Learning Filters For Image Processing

    Full text link
    [EN] This paper presents the `NoisyOffice¿ database. It consists of images of printed text documents with noise mainly caused by uncleanliness from a generic office, such as coffee stains and footprints on documents or folded and wrinkled sheets with degraded printed text. This corpus is intended to train and evaluate supervised learning methods for cleaning, binarization and enhancement of noisy images of grayscale text documents. As an example, several experiments of image enhancement and binarization are presented by using deep learning techniques. Also, double-resolution images are also provided for testing super-resolution methods. The corpus is freely available at UCI Machine Learning Repository.     F-Measure as the error function to train neural networks

    Full text link
    Imbalance datasets impose serious problems in machine learning. For many tasks characterized by imbalanced data, the F-Measure seems more appropiate than the Mean Square Error or other error measures. This paper studies the use of F-Measure as the training criterion for Neural Networks by integrating it in the Error-Backpropagation algorithm. This novel training criterion has been validated empirically on a real task for which F-Measure is typically applied to evaluate the quality. The task consists in cleaning and enhancing ancient document images which is performed, in this work, by means of neural filters.This work has been partially supported by MICINN project HITITA (TIN2010-18958) and by the FPI-MICINN (BES-2011-046167) scholarship from Ministerio de Ciencia e Innovación, Gobierno de España.Pastor Pellicer, J.; Zamora Martínez, FJ.; España Boquera, S.; Castro-Bleda, MJ. (2013). F-Measure as the error function to train neural networks. En Advances in Computational Intelligence. Springer Verlag (Germany). 376-384. https://doi.org/10.1007/978-3-642-38679-4_37S376384