3 research outputs found
A Codebook Generation Algorithm for Document Image Compression
Pattern-matching-based document-compression systems (e.g. for faxing) rely on
finding a small set of patterns that can be used to represent all of the ink in
the document. Finding an optimal set of patterns is NP-hard; previous
compression schemes have resorted to heuristics. This paper describes an
extension of the cross-entropy approach, used previously for measuring pattern
similarity, to this problem. This approach reduces the problem to a k-medians
problem, for which the paper gives a new algorithm with a provably good
performance guarantee. In comparison to previous heuristics (First Fit, with
and without generalized Lloyd's/k-means postprocessing steps), the new
algorithm generates a better codebook, resulting in an overall improvement in
compression performance of almost 17%
A Codebook Generation Algorithm for Document Image Compression
Pattern-matching based document compression systems rely on finding