38 research outputs found
A selectional auto-encoder approach for document image binarization
Binarization plays a key role in the automatic information retrieval from document images. This process is usually performed in the first stages of document analysis systems, and serves as a basis for subsequent steps. Hence it has to be robust in order to allow the full analysis workflow to be successful. Several methods for document image binarization have been proposed so far, most of which are based on hand-crafted image processing strategies. Recently, Convolutional Neural Networks have shown an amazing performance in many disparate duties related to computer vision. In this paper we discuss the use of convolutional auto-encoders devoted to learning an end-to-end map from an input image to its selectional output, in which activations indicate the likelihood of pixels to be either foreground or background. Once trained, documents can therefore be binarized by parsing them through the model and applying a global threshold. This approach has proven to outperform existing binarization strategies in a number of document types.This work was partially supported by the Social Sciences and Humanities Research Council of Canada, the Spanish Ministerio de Ciencia, Innovaci贸n y Universidades through Juan de la Cierva - Formaci贸n grant (Ref. FJCI-2016-27873), and the Universidad de Alicante through grant GRE-16-04
Binarization strategy using multiple convolutional autoencoder network for old Sundanese manuscript images
Computer Systems, Imagery and Medi
CT-Net:Cascade T-shape deep fusion networks for document binarization
Document binarization is a key step in most document analysis tasks. However, historical-document images usually suffer from various degradations, making this a very challenging processing stage. The performance of document image binarization has improved dramatically in recent years by the use of Convolutional Neural Networks (CNNs). In this paper, a dual-task, T-shaped neural network is proposed that has the main task of binarization and an auxiliary task of image enhancement. The neural network for enhancement learns the degradations in document images and the specific CNN-kernel features can be adapted towards the binarization task in the training process. In addition, the enhancement image can be considered as an improved version of the input image, which can be fed into the network for fine-tuning, making it possible to design a chained-cascade network (CT-Net). Experimental results on document binarization competition datasets (DIBCO datasets) and MCS dataset show that our proposed method outperforms competing state-of-the-art methods in most cases
DeepOtsu: Document Enhancement and Binarization using Iterative Deep Learning
This paper presents a novel iterative deep learning framework and apply it
for document enhancement and binarization. Unlike the traditional methods which
predict the binary label of each pixel on the input image, we train the neural
network to learn the degradations in document images and produce the uniform
images of the degraded input images, which allows the network to refine the
output iteratively. Two different iterative methods have been studied in this
paper: recurrent refinement (RR) which uses the same trained neural network in
each iteration for document enhancement and stacked refinement (SR) which uses
a stack of different neural networks for iterative output refinement. Given the
learned uniform and enhanced image, the binarization map can be easy to obtain
by a global or local threshold. The experimental results on several public
benchmark data sets show that our proposed methods provide a new clean version
of the degraded image which is suitable for visualization and promising results
of binarization using the global Otsu's threshold based on the enhanced images
learned iteratively by the neural network.Comment: Accepted by Pattern Recognitio
Staff-line removal with selectional auto-encoders
Staff-line removal is an important preprocessing stage as regards most Optical Music Recognition systems. The common procedures employed to carry out this task involve image processing techniques. In contrast to these traditional methods, which are based on hand-engineered transformations, the problem can also be approached from a machine learning point of view if representative examples of the task are provided. We propose doing this through the use of a new approach involving auto-encoders, which select the appropriate features of an input feature set (Selectional Auto-Encoders). Within the context of the problem at hand, the model is trained to select those pixels of a given image that belong to a musical symbol, thus removing the lines of the staves. Our results show that the proposed technique is quite competitive and significantly outperforms the other state-of-art strategies considered, particularly when dealing with grayscale input images.This work was partially supported by the Spanish Ministerio de Educaci贸n, Cultura y Deporte through a FPU fellowship (AP2012- 0939) and the Spanish Ministerio de Econom铆a y Competitividad through Project TIMuL (No. TIN2013-48152-C2-1-R, supported by UE FEDER funds)