Article thumbnail

2009 10th International Conference on Document Analysis and Recognition A rotation invariant page layout descriptor for document classification and retrieval

By Albert Gordo and Ernest Valveny

Abstract

Document classification usually requieres of structural features such as the physical layout to obtain good accuracy rates on complex documents. This paper introduces a descriptor of the layout and a distance measure based on the cyclic Dynamic Time Warping which can be computed in O(n 2). This descriptor is translation invariant and can be easily modified to be scale and rotation invariant. Experiments with this descriptor and its rotation invariant modification are performed on the Girona Archives database and compared against another common layout distance, the Minimum Weight Edge Cover. The experiments show that these methods outperform the MWEC both in accuracy and speed, particularly on rotated documents. 1

Year: 2012
OAI identifier: oai:CiteSeerX.psu:10.1.1.212.1138
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.cvc.uab.es/icdar200... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.