Search CORE

22 research outputs found

Applying Data Augmentation to Handwritten Arabic Numeral Recognition Using Deep Learning Neural Networks

Author: Ashiquzzaman Akm
Rahman Ashiqur
Tushar Abdul Kawsar
Publication venue
Publication date: 27/09/2017
Field of study

Handwritten character recognition has been the center of research and a benchmark problem in the sector of pattern recognition and artificial intelligence, and it continues to be a challenging research topic. Due to its enormous application many works have been done in this field focusing on different languages. Arabic, being a diversified language has a huge scope of research with potential challenges. A convolutional neural network model for recognizing handwritten numerals in Arabic language is proposed in this paper, where the dataset is subject to various augmentation in order to add robustness needed for deep learning approach. The proposed method is empowered by the presence of dropout regularization to do away with the problem of data overfitting. Moreover, suitable change is introduced in activation function to overcome the problem of vanishing gradient. With these modifications, the proposed system achieves an accuracy of 99.4\% which performs better than every previous work on the dataset.Comment: 5 pages, 6 figures, 3 table

arXiv.org e-Print Archive

Does color modalities affect handwriting recognition? An empirical study on Persian handwritings using convolutional neural networks

Author: Imani Zahra
Sadri Javad
Suen Ching Y.
Zohrevand Abbas
Publication venue
Publication date: 22/07/2023
Field of study

Most of the methods on handwritten recognition in the literature are focused and evaluated on Black and White (BW) image databases. In this paper we try to answer a fundamental question in document recognition. Using Convolutional Neural Networks (CNNs), as eye simulator, we investigate to see whether color modalities of handwritten digits and words affect their recognition accuracy or speed? To the best of our knowledge, so far this question has not been answered due to the lack of handwritten databases that have all three color modalities of handwritings. To answer this question, we selected 13,330 isolated digits and 62,500 words from a novel Persian handwritten database, which have three different color modalities and are unique in term of size and variety. Our selected datasets are divided into training, validation, and testing sets. Afterwards, similar conventional CNN models are trained with the training samples. While the experimental results on the testing set show that CNN on the BW digit and word images has a higher performance compared to the other two color modalities, in general there are no significant differences for network accuracy in different color modalities. Also, comparisons of training times in three color modalities show that recognition of handwritten digits and words in BW images using CNN is much more efficient

arXiv.org e-Print Archive

Recommended from our members

Homogeneous vector capsules and their application to sufficient and complete data

Author: Byerly Adam D.
Publication venue: Brunel University London
Publication date: 01/01/2022
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University LondonCapsules (vector-valued neurons) have recently become a more active area of research in neural networks. However, existing formulations have several drawbacks including the large number of trainable parameters that they require as well as the reliance on routing mechanisms between layers of capsules. The primary aim of this project is to demonstrate the benefits of a new formulation of capsules called Homogeneous Vector Capsules (HVCs) that overcome these drawbacks. Using HVCs, new state-of-the-art accuracies for the MNIST dataset are established for multiple individual models as well as multiple ensembles. This work additionally presents a dataset consisting of high-resolution images of 13 micro-PCBs captured in various rotations and perspectives relative to the camera, with each sample labeled for PCB type, rotation category, and perspective categories. Experiments performed and elucidated in this work examine classification accuracy of rotations and perspectives that were not trained on as well as the ability to artificially generate missing rotations and perspectives during training. The results of these experiments include showing that using HVCs is superior to using fully connected layers. This work also showed that certain training samples are more informative of class membership than others. These samples can be identified prior to training by analyzing their position in reduced dimensional space relative to the classes’ centroids in that space. And a definition and calculation both for class density and dataset completeness based on the distribution of data in the reduced dimensional space has been put forth. Experimentation using the dataset completeness calculation shows that those datasets that meet a certain completeness threshold can be trained on a subset of the total dataset, based on each class’s density, while improving upon or maintaining validation accuracy

Brunel University Research Archive

Semantic radical consistency and character transparency effects in Chinese: an ERP study

Author: Su IF
Weekes BS
Publication venue: 'United States Sports Academy'
Publication date: 01/01/2009
Field of study

BACKGROUND: This event-related potential (ERP) study aims to investigate the representation and temporal dynamics of Chinese orthography-to-semantics mappings by simultaneously manipulating character transparency and semantic radical consistency. Character components, referred to as radicals, make up the building blocks used dur...postprin

HKU Scholars Hub

Learning Identifiable Representations: Independent Influences and Multiple Views

Author: Gresele Luigi
Publication venue: Universität Tübingen
Publication date: 28/11/2023
Field of study

Intelligent systems, whether biological or artificial, perceive unstructured information from the world around them: deep neural networks designed for object recognition receive collections of pixels as inputs; living beings capture visual stimuli through photoreceptors that convert incoming light into electrical signals. Sophisticated signal processing is required to extract meaningful features (e.g., the position, dimension, and colour of objects in an image) from these inputs: this motivates the field of representation learning. But what features should be deemed meaningful, and how to learn them? We will approach these questions based on two metaphors. The first one is the cocktail-party problem, where a number of conversations happen in parallel in a room, and the task is to recover (or separate) the voices of the individual speakers from recorded mixtures—also termed blind source separation. The second one is what we call the independent-listeners problem: given two listeners in front of some loudspeakers, the question is whether, when processing what they hear, they will make the same information explicit, identifying similar constitutive elements. The notion of identifiability is crucial when studying these problems, as it specifies suitable technical assumptions under which representations are uniquely determined, up to tolerable ambiguities like latent source reordering. A key result of this theory is that, when the mixing is nonlinear, the model is provably non-identifiable. A first question is, therefore, under what additional assumptions (ideally as mild as possible) the problem becomes identifiable; a second one is, what algorithms can be used to estimate the model. The contributions presented in this thesis address these questions and revolve around two main principles. The first principle is to learn representation where the latent components influence the observations independently. Here the term “independently” is used in a non-statistical sense—which can be loosely thought of as absence of fine-tuning between distinct elements of a generative process. The second principle is that representations can be learned from paired observations or views, where mixtures of the same latent variables are observed, and they (or a subset thereof) are perturbed in one of the views—also termed multi-view setting. I will present work characterizing these two problem settings, studying their identifiability and proposing suitable estimation algorithms. Moreover, I will discuss how the success of popular representation learning methods may be explained in terms of the principles above and describe an application of the second principle to the statistical analysis of group studies in neuroimaging

Publikationsserver der Universität Tübingen