19,000 research outputs found
Complex Document Classification and Localization Application on Identity Document Images
International audienceThis paper studies the problem of document image classification. More specifically, we address the classification of documents composed of few textual information and complex background (such as identity documents). Unlike most existing systems, the proposed approach simultaneously locates the document and recognizes its class. The latter is defined by the document nature (passport, ID, etc.), emission country, version, and the visible side (main or back). This task is very challenging due to unconstrained capturing conditions, sparse textual information, and varying components that are irrelevant to the classification, e.g. photo, names, address, etc. First, a base of document models is created from reference images. We show that training images are not necessary and only one reference image is enough to create a document model. Then, the query image is matched against all models in the base. Unknown documents are rejected using an estimated quality based on the extracted document. The matching process is optimized to guarantee an execution time independent from the number of document models. Once the document model is found, a more accurate matching is performed to locate the document and facilitate information extraction. Our system is evaluated on several datasets with up to 3042 real documents (representing 64 classes) achieving an accuracy of 96.6%
Deep Adaptive Learning for Writer Identification based on Single Handwritten Word Images
There are two types of information in each handwritten word image: explicit
information which can be easily read or derived directly, such as lexical
content or word length, and implicit attributes such as the author's identity.
Whether features learned by a neural network for one task can be used for
another task remains an open question. In this paper, we present a deep
adaptive learning method for writer identification based on single-word images
using multi-task learning. An auxiliary task is added to the training process
to enforce the emergence of reusable features. Our proposed method transfers
the benefits of the learned features of a convolutional neural network from an
auxiliary task such as explicit content recognition to the main task of writer
identification in a single procedure. Specifically, we propose a new adaptive
convolutional layer to exploit the learned deep features. A multi-task neural
network with one or several adaptive convolutional layers is trained
end-to-end, to exploit robust generic features for a specific main task, i.e.,
writer identification. Three auxiliary tasks, corresponding to three explicit
attributes of handwritten word images (lexical content, word length and
character attributes), are evaluated. Experimental results on two benchmark
datasets show that the proposed deep adaptive learning method can improve the
performance of writer identification based on single-word images, compared to
non-adaptive and simple linear-adaptive approaches.Comment: Under view of Pattern Recognitio
MIDV-2020: A Comprehensive Benchmark Dataset for Identity Document Analysis
Identity documents recognition is an important sub-field of document
analysis, which deals with tasks of robust document detection, type
identification, text fields recognition, as well as identity fraud prevention
and document authenticity validation given photos, scans, or video frames of an
identity document capture. Significant amount of research has been published on
this topic in recent years, however a chief difficulty for such research is
scarcity of datasets, due to the subject matter being protected by security
requirements. A few datasets of identity documents which are available lack
diversity of document types, capturing conditions, or variability of document
field values. In addition, the published datasets were typically designed only
for a subset of document recognition problems, not for a complex identity
document analysis. In this paper, we present a dataset MIDV-2020 which consists
of 1000 video clips, 2000 scanned images, and 1000 photos of 1000 unique mock
identity documents, each with unique text field values and unique artificially
generated faces, with rich annotation. For the presented benchmark dataset
baselines are provided for such tasks as document location and identification,
text fields recognition, and face detection. With 72409 annotated images in
total, to the date of publication the proposed dataset is the largest publicly
available identity documents dataset with variable artificially generated data,
and we believe that it will prove invaluable for advancement of the field of
document analysis and recognition. The dataset is available for download at
ftp://smartengines.com/midv-2020 and http://l3i-share.univ-lr.fr
A deep matrix factorization method for learning attribute representations
Semi-Non-negative Matrix Factorization is a technique that learns a
low-dimensional representation of a dataset that lends itself to a clustering
interpretation. It is possible that the mapping between this new representation
and our original data matrix contains rather complex hierarchical information
with implicit lower-level hidden attributes, that classical one level
clustering methodologies can not interpret. In this work we propose a novel
model, Deep Semi-NMF, that is able to learn such hidden representations that
allow themselves to an interpretation of clustering according to different,
unknown attributes of a given dataset. We also present a semi-supervised
version of the algorithm, named Deep WSF, that allows the use of (partial)
prior information for each of the known attributes of a dataset, that allows
the model to be used on datasets with mixed attribute knowledge. Finally, we
show that our models are able to learn low-dimensional representations that are
better suited for clustering, but also classification, outperforming
Semi-Non-negative Matrix Factorization, but also other state-of-the-art
methodologies variants.Comment: Submitted to TPAMI (16-Mar-2015
An Automatic Reader of Identity Documents
Identity documents automatic reading and verification is an appealing
technology for nowadays service industry, since this task is still mostly
performed manually, leading to waste of economic and time resources. In this
paper the prototype of a novel automatic reading system of identity documents
is presented. The system has been thought to extract data of the main Italian
identity documents from photographs of acceptable quality, like those usually
required to online subscribers of various services. The document is first
localized inside the photo, and then classified; finally, text recognition is
executed. A synthetic dataset has been used, both for neural networks training,
and for performance evaluation of the system. The synthetic dataset avoided
privacy issues linked to the use of real photos of real documents, which will
be used, instead, for future developments of the system.Comment: 6 pages, 9 figure
- …