73 research outputs found
A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis
Automatic analysis of scanned historical documents comprises a wide range of
image analysis tasks, which are often challenging for machine learning due to a
lack of human-annotated learning samples. With the advent of deep neural
networks, a promising way to cope with the lack of training data is to
pre-train models on images from a different domain and then fine-tune them on
historical documents. In the current research, a typical example of such
cross-domain transfer learning is the use of neural networks that have been
pre-trained on the ImageNet database for object recognition. It remains a
mostly open question whether or not this pre-training helps to analyse
historical documents, which have fundamentally different image properties when
compared with ImageNet. In this paper, we present a comprehensive empirical
survey on the effect of ImageNet pre-training for diverse historical document
analysis tasks, including character recognition, style classification,
manuscript dating, semantic segmentation, and content-based retrieval. While we
obtain mixed results for semantic segmentation at pixel-level, we observe a
clear trend across different network architectures that ImageNet pre-training
has a positive effect on classification as well as content-based retrieval
Closing the performance gap between siamese networks for dissimilarity image classification and convolutional neural networks
In this paper, we examine two strategies for boosting the performance of ensembles of Siamese networks (SNNs) for image classification using two loss functions (Triplet and Binary Cross Entropy) and two methods for building the dissimilarity spaces (FULLY and DEEPER). With FULLY, the distance between a pattern and a prototype is calculated by comparing two images using the fully connected layer of the Siamese network. With DEEPER, each pattern is described using a deeper layer combined with dimensionality reduction. The basic design of the SNNs takes advantage of supervised k-means clustering for building the dissimilarity spaces that train a set of support vector machines, which are then combined by sum rule for a final decision. The robustness and versatility of this approach are demonstrated on several cross-domain image data sets, including a portrait data set, two bioimage and two animal vocalization data sets. Results show that the strategies employed in this work to increase the performance of dissimilarity image classification using SNN are closing the gap with standalone CNNs. Moreover, when our best system is combined with an ensemble of CNNs, the resulting performance is superior to an ensemble of CNNs, demonstrating that our new strategy is extracting additional information
Leveraging Expert Models for Training Deep Neural Networks in Scarce Data Domains: Application to Offline Handwritten Signature Verification
This paper introduces a novel approach to leverage the knowledge of existing
expert models for training new Convolutional Neural Networks, on domains where
task-specific data are limited or unavailable. The presented scheme is applied
in offline handwritten signature verification (OffSV) which, akin to other
biometric applications, suffers from inherent data limitations due to
regulatory restrictions. The proposed Student-Teacher (S-T) configuration
utilizes feature-based knowledge distillation (FKD), combining graph-based
similarity for local activations with global similarity measures to supervise
student's training, using only handwritten text data. Remarkably, the models
trained using this technique exhibit comparable, if not superior, performance
to the teacher model across three popular signature datasets. More importantly,
these results are attained without employing any signatures during the feature
extraction training process. This study demonstrates the efficacy of leveraging
existing expert models to overcome data scarcity challenges in OffSV and
potentially other related domains
GR-RNN:Global-Context Residual Recurrent Neural Networks for Writer Identification
This paper presents an end-to-end neural network system to identify writers
through handwritten word images, which jointly integrates global-context
information and a sequence of local fragment-based features. The global-context
information is extracted from the tail of the neural network by a global
average pooling step. The sequence of local and fragment-based features is
extracted from a low-level deep feature map which contains subtle information
about the handwriting style. The spatial relationship between the sequence of
fragments is modeled by the recurrent neural network (RNN) to strengthen the
discriminative ability of the local fragment features. We leverage the
complementary information between the global-context and local fragments,
resulting in the proposed global-context residual recurrent neural network
(GR-RNN) method. The proposed method is evaluated on four public data sets and
experimental results demonstrate that it can provide state-of-the-art
performance. In addition, the neural networks trained on gray-scale images
provide better results than neural networks trained on binarized and contour
images, indicating that texture information plays an important role for writer
identification.
The source code will be available:
\url{https://github.com/shengfly/writer-identification}.Comment: To appear: Pattern Recognitio
Re-ranking for Writer Identification and Writer Retrieval
Automatic writer identification is a common problem in document analysis.
State-of-the-art methods typically focus on the feature extraction step with
traditional or deep-learning-based techniques. In retrieval problems,
re-ranking is a commonly used technique to improve the results. Re-ranking
refines an initial ranking result by using the knowledge contained in the
ranked result, e. g., by exploiting nearest neighbor relations. To the best of
our knowledge, re-ranking has not been used for writer
identification/retrieval. A possible reason might be that publicly available
benchmark datasets contain only few samples per writer which makes a re-ranking
less promising. We show that a re-ranking step based on k-reciprocal nearest
neighbor relationships is advantageous for writer identification, even if only
a few samples per writer are available. We use these reciprocal relationships
in two ways: encode them into new vectors, as originally proposed, or integrate
them in terms of query-expansion. We show that both techniques outperform the
baseline results in terms of mAP on three writer identification datasets
- …