5 research outputs found
Extraction of Projection Profile, Run-Histogram and Entropy Features Straight from Run-Length Compressed Text-Documents
Document Image Analysis, like any Digital Image Analysis requires
identification and extraction of proper features, which are generally extracted
from uncompressed images, though in reality images are made available in
compressed form for the reasons such as transmission and storage efficiency.
However, this implies that the compressed image should be decompressed, which
indents additional computing resources. This limitation induces the motivation
to research in extracting features directly from the compressed image. In this
research, we propose to extract essential features such as projection profile,
run-histogram and entropy for text document analysis directly from run-length
compressed text-documents. The experimentation illustrates that features are
extracted directly from the compressed image without going through the stage of
decompression, because of which the computing time is reduced. The feature
values so extracted are exactly identical to those extracted from uncompressed
images.Comment: Published by IEEE in Proceedings of ACPR-2013. arXiv admin note: text
overlap with arXiv:1403.778
Entropy Computation of Document Images in Run-Length Compressed Domain
Compression of documents, images, audios and videos have been traditionally
practiced to increase the efficiency of data storage and transfer. However, in
order to process or carry out any analytical computations, decompression has
become an unavoidable pre-requisite. In this research work, we have attempted
to compute the entropy, which is an important document analytic directly from
the compressed documents. We use Conventional Entropy Quantifier (CEQ) and
Spatial Entropy Quantifiers (SEQ) for entropy computations [1]. The entropies
obtained are useful in applications like establishing equivalence, word
spotting and document retrieval. Experiments have been performed with all the
data sets of [1], at character, word and line levels taking compressed
documents in run-length compressed domain. The algorithms developed are
computational and space efficient, and results obtained match 100% with the
results reported in [1].Comment: Published in IEEE Proceedings 2014 Fifth International Conference on
Signals and Image Processin