The tasks of indexing and retrieval are specifically challenging for the erroneous output of handwriting recognition (HR) systems. This paper proposes an approach of indexing and retrieving degraded documents with very low recognition rates. We present a modified version of the popular Vector Model in information retrieval (IR). Our model incorporates top n candidates from a HR system into the scheme of calculating the term frequency (tf) and the inverted document frequency (idf). Standardized IR Tests show that the proposed approach outperforms the retrieval of ordinary HR text in terms of mean average precision (MAP) and R-Precision.
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.