7 research outputs found

    MIDV-2020: A Comprehensive Benchmark Dataset for Identity Document Analysis

    Get PDF
    Identity documents recognition is an important sub-field of document analysis, which deals with tasks of robust document detection, type identification, text fields recognition, as well as identity fraud prevention and document authenticity validation given photos, scans, or video frames of an identity document capture. Significant amount of research has been published on this topic in recent years, however a chief difficulty for such research is scarcity of datasets, due to the subject matter being protected by security requirements. A few datasets of identity documents which are available lack diversity of document types, capturing conditions, or variability of document field values. In addition, the published datasets were typically designed only for a subset of document recognition problems, not for a complex identity document analysis. In this paper, we present a dataset MIDV-2020 which consists of 1000 video clips, 2000 scanned images, and 1000 photos of 1000 unique mock identity documents, each with unique text field values and unique artificially generated faces, with rich annotation. For the presented benchmark dataset baselines are provided for such tasks as document location and identification, text fields recognition, and face detection. With 72409 annotated images in total, to the date of publication the proposed dataset is the largest publicly available identity documents dataset with variable artificially generated data, and we believe that it will prove invaluable for advancement of the field of document analysis and recognition. The dataset is available for download at ftp://smartengines.com/midv-2020 and http://l3i-share.univ-lr.fr

    Weighted combination of per-frame recognition results for text recognition in a video stream

    Get PDF
    The scope of uses of automated document recognition has extended and as a result, recognition techniques that do not require specialized equipment have become more relevant. Among such techniques, document recognition using mobile devices is of interest. However, it is not always possible to ensure controlled capturing conditions and, consequentially, high quality of input images. Unlike specialized scanners, mobile cameras allow using a video stream as an input, thus obtaining several images of the recognized object, captured with various characteristics. In this case, a problem of combining the information from multiple input frames arises. In this paper, we propose a weighing model for the process of combining the per-frame recognition results, two approaches to the weighted combination of the text recognition results, and two weighing criteria. The effectiveness of the proposed approaches is tested using datasets of identity documents captured with a mobile device camera in different conditions, including perspective distortion of the document image and low lighting conditions. The experimental results show that the weighting combination can improve the text recognition result quality in the video stream, and the per-character weighting method with input image focus estimation as a base criterion allows one to achieve the best results on the datasets analyzed.This work is partially supported by the Russian Foundation for Basic Research (projects 17-29-03236 and 18-07-01387)

    MIDV-2020: a comprehensive benchmark dataset for identity document analysis

    Get PDF
    Identity documents recognition is an important sub-field of document analysis, which deals with tasks of robust document detection, type identification, text fields recognition, as well as identity fraud prevention and document authenticity validation given photos, scans, or video frames of an identity document capture. Significant amount of research has been published on this topic in recent years, however a chief difficulty for such research is scarcity of datasets, due to the subject matter being protected by security requirements. A few datasets of identity documents which are available lack diversity of document types, capturing conditions, or variability of document field values. In this paper, we present a dataset MIDV-2020 which consists of 1000 video clips, 2000 scanned images, and 1000 photos of 1000 unique mock identity documents, each with unique text field values and unique artificially generated faces, with rich annotation. The dataset contains 72409 annotated images in total, making it the largest publicly available identity document dataset to the date of publication. We describe the structure of the dataset, its content and annotations, and present baseline experimental results to serve as a basis for future research. For the task of document location and identification content-independent, feature-based, and semantic segmentation-based methods were evaluated. For the task of document text field recognition, the Tesseract system was evaluated on field and character levels with grouping by field alphabets and document types. For the task of face detection, the performance of Multi Task Cascaded Convolutional Neural Networks-based method was evaluated separately for different types of image input modes. The baseline evaluations show that the existing methods of identity document analysis have a lot of room for improvement given modern challenges. We believe that the proposed dataset will prove invaluable for advancement of the field of document analysis and recognition.This work is partially supported by Russian Foundation for Basic Research (projects 19-29-09066 and 19-29-09092). All source images for MIDV-2020 dataset were obtained from Wikimedia Commons. Author attributions for each source images are listed in the original MIDV-500 description table (ftp://smartengines.com/midv-500/documents.pdf). Face images by Generated Photos (https://generated.photos)

    Towards a unified framework for identity documents analysis and recognition

    Get PDF
    Identity documents recognition is far beyond classical optical character recognition problems. Automated ID document recognition systems are tasked not only with the extraction of editable and transferable data but with performing identity validation and preventing fraud, with an increasingly high cost of error. A significant amount of research is directed to the creation of ID analysis systems with a specific focus for a subset of document types, or a particular mode of image acquisition, however, one of the challenges of the modern world is an increasing demand for identity document recognition from a wide variety of image sources, such as scans, photos, or video frames, as well as in a variety of virtually uncontrolled capturing conditions. In this paper, we describe the scope and context of identity document analysis and recognition problem and its challenges; analyze the existing works on implementing ID document recognition systems; and set a task to construct a unified framework for identity document recognition, which would be applicable for different types of image sources and capturing conditions, as well as scalable enough to support large number of identity document types. The aim of the presented framework is to serve as a basis for developing new methods and algorithms for ID document recognition, as well as for far more heavy challenges of identity document forensics, fully automated personal authentication and fraud prevention.This work was partially supported by the Russian Foundation for Basic Research (Project No. 18-29-03085 and 19-29-09055)

    Document image analysis and recognition: a survey

    Get PDF
    This paper analyzes the problems of document image recognition and the existing solutions. Document recognition algorithms have been studied for quite a long time, but despite this, currently, the topic is relevant and research continues, as evidenced by a large number of associated publications and reviews. However, most of these works and reviews are devoted to individual recognition tasks. In this review, the entire set of methods, approaches, and algorithms necessary for document recognition is considered. A preliminary systematization allowed us to distinguish groups of methods for extracting information from documents of different types: single-page and multi-page, with text and handwritten contents, with a fixed template and flexible structure, and digitalized via different ways: scanning, photographing, video recording. Here, we consider methods of document recognition and analysis applied to a wide range of tasks: identification and verification of identity, due diligence, machine learning algorithms, questionnaires, and audits. The groups of methods necessary for the recognition of a single page image are examined: the classical computer vision algorithms, i.e., keypoints, local feature descriptors, Fast Hough Transforms, image binarization, and modern neural network models for document boundary detection, document classification, document structure analysis, i.e., text blocks and tables localization, extraction and recognition of the details, post-processing of recognition results. The review provides a description of publicly available experimental data packages for training and testing recognition algorithms. Methods for optimizing the performance of document image analysis and recognition methods are described.The reported study was funded by RFBR, project number 20-17-50177. The authors thank Sc. D. Vladimir L. Arlazarov (FRC CSC RAS), Pavel Bezmaternykh (FRC CSC RAS), Elena Limonova (FRC CSC RAS), Ph. D. Dmitry Polevoy (FRC CSC RAS), Daniil Tropin (LLC “Smart Engines Service”), Yuliya Chernysheva (LLC “Smart Engines Service”), Yuliya Shemyakina (LLC “Smart Engines Service”) for valuable comments and suggestions
    corecore