8 research outputs found
Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition
Handwritten Text Recognition (HTR) is still a challenging problem because it
must deal with two important difficulties: the variability among writing
styles, and the scarcity of labelled data. To alleviate such problems,
synthetic data generation and data augmentation are typically used to train HTR
systems. However, training with such data produces encouraging but still
inaccurate transcriptions in real words. In this paper, we propose an
unsupervised writer adaptation approach that is able to automatically adjust a
generic handwritten word recognizer, fully trained with synthetic fonts,
towards a new incoming writer. We have experimentally validated our proposal
using five different datasets, covering several challenges (i) the document
source: modern and historic samples, which may involve paper degradation
problems; (ii) different handwriting styles: single and multiple writer
collections; and (iii) language, which involves different character
combinations. Across these challenging collections, we show that our system is
able to maintain its performance, thus, it provides a practical and generic
approach to deal with new document collections without requiring any expensive
and tedious manual annotation step.Comment: Accepted to WACV 202
Decoupled Attention Network for Text Recognition
Text recognition has attracted considerable research interests because of its
various applications. The cutting-edge text recognition methods are based on
attention mechanisms. However, most of attention methods usually suffer from
serious alignment problem due to its recurrency alignment operation, where the
alignment relies on historical decoding results. To remedy this issue, we
propose a decoupled attention network (DAN), which decouples the alignment
operation from using historical decoding results. DAN is an effective, flexible
and robust end-to-end text recognizer, which consists of three components: 1) a
feature encoder that extracts visual features from the input image; 2) a
convolutional alignment module that performs the alignment operation based on
visual features from the encoder; and 3) a decoupled text decoder that makes
final prediction by jointly using the feature map and attention maps.
Experimental results show that DAN achieves state-of-the-art performance on
multiple text recognition tasks, including offline handwritten text recognition
and regular/irregular scene text recognition.Comment: 9 pages, 8 figures, 6 tables, accepted by AAAI-202
Ensemble learning using multi-objective optimisation for arabic handwritten words
Arabic handwriting recognition is a dynamic and stimulating field of study within
pattern recognition. This system plays quite a significant part in today's global
environment. It is a widespread and computationally costly function due to cursive
writing, a massive number of words, and writing style. Based on the literature, the
existing features lack data supportive techniques and building geometric features.
Most ensemble learning approaches are based on the assumption of linear
combination, which is not valid due to differences in data types. Also, the existing
approaches of classifier generation do not support decision-making for selecting the
most suitable classifier, and it requires enabling multi-objective optimisation to handle
these differences in data types. In this thesis, new type of feature for handwriting using
Segments Interpolation (SI) to find the best fitting line in each of the windows with a
model for finding the best operating point window size for SI features. Multi-Objective
Ensemble Oriented (MOEO) formulated to control the classifier topology and provide
feedback support for changing the classifiers' topology and weights based on the
extension of Non-dominated Sorting Genetic Algorithm (NSGA-II). It is designated
as the Random Subset based Parents Selection (RSPS-NSGA-II) to handle neurons
and accuracy. Evaluation metrics from two perspectives classification and Multiobjective
optimization. The experimental design based on two subsets of the
IFN/ENIT database. The first one consists of 10 classes (C10) and 22 classes (C22).
The features were tested with Support Vector Machine (SVM) and Extreme Learning
Machine (ELM). This work improved due to the SI feature. SI shows a significant
result with SVM with 88.53% for C22. RSPS for C10 at k=2 achieved 91% accuracy
with fewer neurons than NSGA-II, and for C22 at k=10, accuracy has been increased
81% compared to NSGA-II 78%. Future work may consider introducing more features
to the system, applying them to other languages, and integrating it with sequence
learning for more accuracy