8,757 research outputs found
Text Line Segmentation of Historical Documents: a Survey
There is a huge amount of historical documents in libraries and in various
National Archives that have not been exploited electronically. Although
automatic reading of complete pages remains, in most cases, a long-term
objective, tasks such as word spotting, text/image alignment, authentication
and extraction of specific fields are in use today. For all these tasks, a
major step is document segmentation into text lines. Because of the low quality
and the complexity of these documents (background noise, artifacts due to
aging, interfering lines),automatic text line segmentation remains an open
research field. The objective of this paper is to present a survey of existing
methods, developed during the last decade, and dedicated to documents of
historical interest.Comment: 25 pages, submitted version, To appear in International Journal on
Document Analysis and Recognition, On line version available at
http://www.springerlink.com/content/k2813176280456k3
A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis
Automatic analysis of scanned historical documents comprises a wide range of
image analysis tasks, which are often challenging for machine learning due to a
lack of human-annotated learning samples. With the advent of deep neural
networks, a promising way to cope with the lack of training data is to
pre-train models on images from a different domain and then fine-tune them on
historical documents. In the current research, a typical example of such
cross-domain transfer learning is the use of neural networks that have been
pre-trained on the ImageNet database for object recognition. It remains a
mostly open question whether or not this pre-training helps to analyse
historical documents, which have fundamentally different image properties when
compared with ImageNet. In this paper, we present a comprehensive empirical
survey on the effect of ImageNet pre-training for diverse historical document
analysis tasks, including character recognition, style classification,
manuscript dating, semantic segmentation, and content-based retrieval. While we
obtain mixed results for semantic segmentation at pixel-level, we observe a
clear trend across different network architectures that ImageNet pre-training
has a positive effect on classification as well as content-based retrieval
Enhancement of Image Resolution by Binarization
Image segmentation is one of the principal approaches of image processing.
The choice of the most appropriate Binarization algorithm for each case proved
to be a very interesting procedure itself. In this paper, we have done the
comparison study between the various algorithms based on Binarization
algorithms and propose a methodologies for the validation of Binarization
algorithms. In this work we have developed two novel algorithms to determine
threshold values for the pixels value of the gray scale image. The performance
estimation of the algorithm utilizes test images with, the evaluation metrics
for Binarization of textual and synthetic images. We have achieved better
resolution of the image by using the Binarization method of optimum
thresholding techniques.Comment: 5 pages, 8 figure
Extraction of Projection Profile, Run-Histogram and Entropy Features Straight from Run-Length Compressed Text-Documents
Document Image Analysis, like any Digital Image Analysis requires
identification and extraction of proper features, which are generally extracted
from uncompressed images, though in reality images are made available in
compressed form for the reasons such as transmission and storage efficiency.
However, this implies that the compressed image should be decompressed, which
indents additional computing resources. This limitation induces the motivation
to research in extracting features directly from the compressed image. In this
research, we propose to extract essential features such as projection profile,
run-histogram and entropy for text document analysis directly from run-length
compressed text-documents. The experimentation illustrates that features are
extracted directly from the compressed image without going through the stage of
decompression, because of which the computing time is reduced. The feature
values so extracted are exactly identical to those extracted from uncompressed
images.Comment: Published by IEEE in Proceedings of ACPR-2013. arXiv admin note: text
overlap with arXiv:1403.778
Image Enhancement with Statistical Estimation
Contrast enhancement is an important area of research for the image analysis.
Over the decade, the researcher worked on this domain to develop an efficient
and adequate algorithm. The proposed method will enhance the contrast of image
using Binarization method with the help of Maximum Likelihood Estimation (MLE).
The paper aims to enhance the image contrast of bimodal and multi-modal images.
The proposed methodology use to collect mathematical information retrieves from
the image. In this paper, we are using binarization method that generates the
desired histogram by separating image nodes. It generates the enhanced image
using histogram specification with binarization method. The proposed method has
showed an improvement in the image contrast enhancement compare with the other
image.Comment: 9 pages,6 figures; ISSN:0975-5578 (Online); 0975-5934 (Print
Page layout analysis and classification in complex scanned documents
Page layout analysis has been extensively studied since the 1980`s, particularly after computers began to be used for document storage or database units. For efficient document storage and retrieval from a database, a paper document would be transformed into its electronic version. Algorithms and methodologies are used for document image analysis in order to segment a scanned document into different regions such as text, image or line regions. To contribute a novel approach in the field of page layout analysis and classification, this algorithm is developed for both RGB space and grey-scale scanned documents without requiring any specific document types, and scanning techniques. In this thesis, a page classification algorithm is proposed which mainly applies wavelet transform, Markov random field (MRF) and Hough transform to segment text, photo and strong edge/ line regions in both color and gray-scale scanned documents. The algorithm is developed to handle both simple and complex page layout structures and contents (text only vs. book cover that includes text, lines and/or photos). The methodology consists of five modules. In the first module, called pre-processing, image enhancements techniques such as image scaling, filtering, color space conversion or gamma correction are applied in order to reduce computation time and enhance the scanned document. The techniques, used to perform the classification, are employed on the one-fourth resolution input image in the CIEL*a*b* color space. In the second module, the text detection module uses wavelet analysis to generate a text-region candidate map which is enhanced by applying a Run Length Encoding (RLE) technique for verification purposes. The third module, photo detection, initially uses block-wise segmentation which is based on basis vector projection technique. Then, MRF with maximum a-posteriori (MAP) optimization framework is utilized to generate photo map. Next, Hough transform is applied to locate lines in the fourth module. Techniques for edge detection, edge linkages, and line-segment fitting are used to detect strong-edges in the module as well. After those three classification maps are obtained, in the last module a final page layout map is generated by using K-Means. Features are extracted to classify the intersection regions and merge into one classification map with K-Means clustering. The proposed technique is tested on several hundred images and its performance is validated by utilizing Confusion Matrix (CM). It shows that the technique achieves an average of 85% classification accuracy rate in text, photo, and background regions on a variety of scanned documents like articles, magazines, business-cards, dictionaries or newsletters etc. More importantly, it performs independently from a scanning process and an input scanned document (RGB or gray-scale) with comparable classification quality
- …