22 research outputs found
Applying Data Augmentation to Handwritten Arabic Numeral Recognition Using Deep Learning Neural Networks
Handwritten character recognition has been the center of research and a
benchmark problem in the sector of pattern recognition and artificial
intelligence, and it continues to be a challenging research topic. Due to its
enormous application many works have been done in this field focusing on
different languages. Arabic, being a diversified language has a huge scope of
research with potential challenges. A convolutional neural network model for
recognizing handwritten numerals in Arabic language is proposed in this paper,
where the dataset is subject to various augmentation in order to add robustness
needed for deep learning approach. The proposed method is empowered by the
presence of dropout regularization to do away with the problem of data
overfitting. Moreover, suitable change is introduced in activation function to
overcome the problem of vanishing gradient. With these modifications, the
proposed system achieves an accuracy of 99.4\% which performs better than every
previous work on the dataset.Comment: 5 pages, 6 figures, 3 table
Does color modalities affect handwriting recognition? An empirical study on Persian handwritings using convolutional neural networks
Most of the methods on handwritten recognition in the literature are focused
and evaluated on Black and White (BW) image databases. In this paper we try to
answer a fundamental question in document recognition. Using Convolutional
Neural Networks (CNNs), as eye simulator, we investigate to see whether color
modalities of handwritten digits and words affect their recognition accuracy or
speed? To the best of our knowledge, so far this question has not been answered
due to the lack of handwritten databases that have all three color modalities
of handwritings. To answer this question, we selected 13,330 isolated digits
and 62,500 words from a novel Persian handwritten database, which have three
different color modalities and are unique in term of size and variety. Our
selected datasets are divided into training, validation, and testing sets.
Afterwards, similar conventional CNN models are trained with the training
samples. While the experimental results on the testing set show that CNN on the
BW digit and word images has a higher performance compared to the other two
color modalities, in general there are no significant differences for network
accuracy in different color modalities. Also, comparisons of training times in
three color modalities show that recognition of handwritten digits and words in
BW images using CNN is much more efficient
Recommended from our members
Homogeneous vector capsules and their application to sufficient and complete data
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University LondonCapsules (vector-valued neurons) have recently become a more active area of research
in neural networks. However, existing formulations have several drawbacks including
the large number of trainable parameters that they require as well as the reliance on
routing mechanisms between layers of capsules.
The primary aim of this project is to demonstrate the benefits of a new formulation
of capsules called Homogeneous Vector Capsules (HVCs) that overcome these
drawbacks.
Using HVCs, new state-of-the-art accuracies for the MNIST dataset are established
for multiple individual models as well as multiple ensembles.
This work additionally presents a dataset consisting of high-resolution images of
13 micro-PCBs captured in various rotations and perspectives relative to the camera,
with each sample labeled for PCB type, rotation category, and perspective categories.
Experiments performed and elucidated in this work examine classification accuracy of
rotations and perspectives that were not trained on as well as the ability to artificially
generate missing rotations and perspectives during training. The results of these
experiments include showing that using HVCs is superior to using fully connected
layers.
This work also showed that certain training samples are more informative of class
membership than others. These samples can be identified prior to training by analyzing
their position in reduced dimensional space relative to the classes’ centroids in that
space. And a definition and calculation both for class density and dataset completeness
based on the distribution of data in the reduced dimensional space has been put forth.
Experimentation using the dataset completeness calculation shows that those datasets
that meet a certain completeness threshold can be trained on a subset of the total
dataset, based on each class’s density, while improving upon or maintaining validation
accuracy
Semantic radical consistency and character transparency effects in Chinese: an ERP study
BACKGROUND: This event-related potential (ERP) study aims to investigate the representation and temporal dynamics of Chinese orthography-to-semantics mappings by simultaneously manipulating character transparency and semantic radical consistency. Character components, referred to as radicals, make up the building blocks used dur...postprin
Learning Identifiable Representations: Independent Influences and Multiple Views
Intelligent systems, whether biological or artificial, perceive unstructured information from the world around them: deep neural networks designed for object recognition receive collections of pixels as inputs; living beings capture visual stimuli through photoreceptors that convert incoming light into electrical signals. Sophisticated signal processing is required to extract meaningful features (e.g., the position, dimension, and colour of objects in an image) from these inputs: this motivates the field of representation learning. But what features should be deemed meaningful, and how to learn them?
We will approach these questions based on two metaphors. The first one is the cocktail-party problem, where a number of conversations happen in parallel in a room, and the task is to recover (or separate) the voices of the individual speakers from recorded mixtures—also termed blind source separation. The second one is what we call the independent-listeners problem: given two listeners in front of some loudspeakers, the question is whether, when processing what they hear, they will make the same information explicit, identifying similar constitutive elements. The notion of identifiability is crucial when studying these problems, as it specifies suitable technical assumptions under which representations are uniquely determined, up to tolerable ambiguities like latent source reordering. A key result of this theory is that, when the mixing is nonlinear, the model is provably non-identifiable. A first question is, therefore, under what additional assumptions (ideally as mild as possible) the problem becomes identifiable; a second one is, what algorithms can be used to estimate the model.
The contributions presented in this thesis address these questions and revolve around two main principles. The first principle is to learn representation where the latent components influence the observations independently. Here the term “independently” is used in a non-statistical sense—which can be loosely thought of as absence of fine-tuning between distinct elements of a generative process. The second principle is that representations can be learned from paired observations or views, where mixtures of the same latent variables are observed, and they (or a subset thereof) are perturbed in one of the views—also termed multi-view setting. I will present work characterizing these two problem settings, studying their identifiability and proposing suitable estimation algorithms. Moreover, I will discuss how the success of popular representation learning methods may be explained in terms of the principles above and describe an application of the second principle to the statistical analysis of group studies in neuroimaging