4,136 research outputs found
Pattern Spotting and Image Retrieval in Historical Documents using Deep Hashing
This paper presents a deep learning approach for image retrieval and pattern
spotting in digital collections of historical documents. First, a region
proposal algorithm detects object candidates in the document page images. Next,
deep learning models are used for feature extraction, considering two distinct
variants, which provide either real-valued or binary code representations.
Finally, candidate images are ranked by computing the feature similarity with a
given input query. A robust experimental protocol evaluates the proposed
approach considering each representation scheme (real-valued and binary code)
on the DocExplore image database. The experimental results show that the
proposed deep models compare favorably to the state-of-the-art image retrieval
approaches for images of historical documents, outperforming other deep models
by 2.56 percentage points using the same techniques for pattern spotting.
Besides, the proposed approach also reduces the search time by up to 200x and
the storage cost up to 6,000x when compared to related works based on
real-valued representations.Comment: 7 page
Word matching using single closed contours for indexing handwritten historical documents
Effective indexing is crucial for providing convenient access to scanned versions of large collections of historically valuable handwritten manuscripts. Since traditional handwriting recognizers based on optical character recognition (OCR) do not perform well on historical documents, recently a holistic word recognition approach has gained in popularity as an attractive and more straightforward solution (Lavrenko et al. in proc. document Image Analysis for Libraries (DIAL’04), pp. 278–287, 2004). Such techniques attempt to recognize words based on scalar and profile-based features extracted from whole word images. In this paper, we propose a new approach to holistic word recognition for historical handwritten manuscripts based on matching word contours instead of whole images or word profiles. The new method consists of robust extraction of closed word contours and the application of an elastic contour matching technique proposed originally for general shapes (Adamek and O’Connor in IEEE Trans Circuits Syst Video Technol 5:2004). We demonstrate that multiscale contour-based descriptors can effectively capture intrinsic word features avoiding any segmentation of words into smaller subunits. Our experiments show a recognition accuracy of 83%, which considerably exceeds the performance of other systems reported in the literature
- …