2 research outputs found

    Document image retrieval based on density distribution feature and key block feature

    Full text link
    Document image retrieval is an important part of many document image processing systems such as paperless office systems, digital libraries and so on. Its task is to help users find out the most similar document images from a document image database. For developing a System of document image retrieval among different resolutions, different formats document images with hybrid characters of multiple languages,. a new retrieval method based on document image density distribution features and key block features is proposed in this paper. Firstly, the density distribution and key block features of a document image are defined and extracted based on documents' print-core. Secondly, the candidate document images are attained based on the density distribution features. Thirdly, to improve reliability of the retrieval results, a confirmation procedure using key block features is applied to those candidates. Experimental results on a large scale document image database, which contains 10385 document images, show that the proposed method is efficient and robust to retrieve different kinds of document images in real time.http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000232022600204&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=8e1609b174ce4e31116a60747a720701Computer Science, Artificial IntelligenceComputer Science, Information SystemsCPCI-S(ISTP)

    Document image analysis and recognition: a survey

    Get PDF
    This paper analyzes the problems of document image recognition and the existing solutions. Document recognition algorithms have been studied for quite a long time, but despite this, currently, the topic is relevant and research continues, as evidenced by a large number of associated publications and reviews. However, most of these works and reviews are devoted to individual recognition tasks. In this review, the entire set of methods, approaches, and algorithms necessary for document recognition is considered. A preliminary systematization allowed us to distinguish groups of methods for extracting information from documents of different types: single-page and multi-page, with text and handwritten contents, with a fixed template and flexible structure, and digitalized via different ways: scanning, photographing, video recording. Here, we consider methods of document recognition and analysis applied to a wide range of tasks: identification and verification of identity, due diligence, machine learning algorithms, questionnaires, and audits. The groups of methods necessary for the recognition of a single page image are examined: the classical computer vision algorithms, i.e., keypoints, local feature descriptors, Fast Hough Transforms, image binarization, and modern neural network models for document boundary detection, document classification, document structure analysis, i.e., text blocks and tables localization, extraction and recognition of the details, post-processing of recognition results. The review provides a description of publicly available experimental data packages for training and testing recognition algorithms. Methods for optimizing the performance of document image analysis and recognition methods are described.The reported study was funded by RFBR, project number 20-17-50177. The authors thank Sc. D. Vladimir L. Arlazarov (FRC CSC RAS), Pavel Bezmaternykh (FRC CSC RAS), Elena Limonova (FRC CSC RAS), Ph. D. Dmitry Polevoy (FRC CSC RAS), Daniil Tropin (LLC “Smart Engines Service”), Yuliya Chernysheva (LLC “Smart Engines Service”), Yuliya Shemyakina (LLC “Smart Engines Service”) for valuable comments and suggestions
    corecore