98 research outputs found
Unconstrained Scene Text and Video Text Recognition for Arabic Script
Building robust recognizers for Arabic has always been challenging. We
demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid
architecture in recognizing Arabic text in videos and natural scenes. We
outperform previous state-of-the-art on two publicly available video text
datasets - ALIF and ACTIV. For the scene text recognition task, we introduce a
new Arabic scene text dataset and establish baseline results. For scripts like
Arabic, a major challenge in developing robust recognizers is the lack of large
quantity of annotated data. We overcome this by synthesising millions of Arabic
text images from a large vocabulary of Arabic words and phrases. Our
implementation is built on top of the model introduced here [37] which is
proven quite effective for English scene text recognition. The model follows a
segmentation-free, sequence to sequence transcription approach. The network
transcribes a sequence of convolutional features from the input image to a
sequence of target labels. This does away with the need for segmenting input
image into constituent characters/glyphs, which is often difficult for Arabic
script. Further, the ability of RNNs to model contextual dependencies yields
superior recognition results.Comment: 5 page
Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition
Handwritten Text Recognition (HTR) is still a challenging problem because it
must deal with two important difficulties: the variability among writing
styles, and the scarcity of labelled data. To alleviate such problems,
synthetic data generation and data augmentation are typically used to train HTR
systems. However, training with such data produces encouraging but still
inaccurate transcriptions in real words. In this paper, we propose an
unsupervised writer adaptation approach that is able to automatically adjust a
generic handwritten word recognizer, fully trained with synthetic fonts,
towards a new incoming writer. We have experimentally validated our proposal
using five different datasets, covering several challenges (i) the document
source: modern and historic samples, which may involve paper degradation
problems; (ii) different handwriting styles: single and multiple writer
collections; and (iii) language, which involves different character
combinations. Across these challenging collections, we show that our system is
able to maintain its performance, thus, it provides a practical and generic
approach to deal with new document collections without requiring any expensive
and tedious manual annotation step.Comment: Accepted to WACV 202
Towards robust real-world historical handwriting recognition
In this thesis, we make a bridge from the past to the future by using artificial-intelligence methods for text recognition in a historical Dutch collection of the Natuurkundige Commissie that explored Indonesia (1820-1850). In spite of the successes of systems like 'ChatGPT', reading historical handwriting is still quite challenging for AI. Whereas GPT-like methods work on digital texts, historical manuscripts are only available as an extremely diverse collections of (pixel) images. Despite the great results, current DL methods are very data greedy, time consuming, heavily dependent on the human expert from the humanities for labeling and require machine-learning experts for designing the models. Ideally, the use of deep learning methods should require minimal human effort, have an algorithm observe the evolution of the training process, and avoid inefficient use of the already sparse amount of labeled data. We present several approaches towards dealing with these problems, aiming to improve the robustness of current methods and to improve the autonomy in training. We applied our novel word and line text recognition approaches on nine data sets differing in time period, language, and difficulty: three locally collected historical Latin-based data sets from Naturalis, Leiden; four public Latin-based benchmark data sets for comparability with other approaches; and two Arabic data sets. Using ensemble voting of just five neural networks, a level of accuracy was achieved which required hundreds of neural networks in earlier studies. Moreover, we increased the speed of evaluation of each training epoch without the need of labeled data
A sequential handwriting recognition model based on a dynamically configurable CRNN
Handwriting recognition refers to recognizing a handwritten input that includes character(s) or digit(s) based on an image. Because most applications of handwriting recognition in real life contain sequential text in various languages, there is a need to develop a dynamic handwriting recognition system. Inspired by the neuroevolutionary technique, this paper proposes a Dynamically Configurable Convolutional Recurrent Neural Network (DC-CRNN) for the handwriting recognition sequence modeling task. The proposed DC-CRNN is based on the Salp Swarm Optimization Algorithm (SSA), which generates the optimal structure and hyperparameters for Convolutional Recurrent Neural Networks (CRNNs). In addition, we investigate two types of encoding techniques used to translate the output of optimization to a CRNN recognizer. Finally, we proposed a novel hybridized SSA with Late Acceptance Hill-Climbing (LAHC) to improve the exploitation process. We conducted our experiments on two well-known datasets, IAM and IFN/ENIT, which include both the Arabic and English languages. The experimental results have shown that LAHC significantly improves the SSA search process. Therefore, the proposed DC-CRNN outperforms the handcrafted CRNN methods
Novel Deep Convolutional Neural Network-Based Contextual Recognition of Arabic Handwritten Scripts
Offline Arabic Handwriting Recognition (OAHR) has recently become instrumental in the areas of pattern recognition and image processing due to its application in several fields, such as office automation and document processing. However, OAHR continues to face several challenges, including the high variability of the Arabic script and its intrinsic characteristics such as cursiveness, ligatures, and diacritics, the unlimited variation in human handwriting, and the lack of large public databases. In this paper, we have introduced a novel context-aware model based on deep neural networks to address the challenges of recognizing offline handwritten Arabic text, including isolated digits, characters, and words. Specifically, we have proposed a supervised Convolutional Neural Network (CNN) model that contextually extracts optimal features and employs batch normalization and dropout regularization parameters to prevent overfitting and further enhance its generalization performance when compared to conventional deep learning models. We employed numerous deep stacked-convolutional layers to design the proposed Deep CNN (DCNN) architecture. The proposed model was extensively evaluated, and it was observed to achieve excellent classification accuracy when compared to the existing state-of-the-art OAHR approaches on a diverse set of six benchmark databases, including MADBase (Digits), CMATERDB (Digits), HACDB (Characters), SUST-ALT (Digits), SUST-ALT (Characters), and SUST-ALT (Names). Further comparative experiments were conducted on the respective databases using the pre-trained VGGNet-19 and Mobile-Net models; additionally, generalization capabilities experiments on another language database (i.e., MNIST English Digits) were conducted, which showed the superiority of the proposed DCNN model
Embedding and learning with signatures
Sequential and temporal data arise in many fields of research, such as quantitative finance, medicine, or computer vision. The present article is concerned with a novel approach for sequential learning, called the signature method, and rooted in rough path theory. Its basic principle is to represent multidimensional paths by a graded feature set of their iterated integrals, called the signature. This approach relies critically on an embedding principle, which consists in representing discretely sampled data as paths, i.e., functions from [0,1] to R^d. After a survey of machine learning methodologies for signatures, we investigate the influence of embeddings on prediction accuracy with an in-depth study of three recent and challenging datasets. We show that a specific embedding, called lead-lag, is systematically better, whatever the dataset or algorithm used. Moreover, we emphasize through an empirical study that computing signatures over the whole path domain does not lead to a loss of local information. We conclude that, with a good embedding, the signature combined with a simple algorithm achieves results competitive with state-of-the-art, domain-specific approaches
- …