3,737 research outputs found
Use of neural networks to predict Ocr accuracy
Use of Neural Networks to Predict OCR Accuracy investigates issues in developing an artificial neural network (ANN) based system for prediction of OCR accuracy from the image of a page. This work extends the work of Blando and Gonzalez in the following ways: enlarging training data, proposing new features, comparing different ANN architectures, and introducing a cross-validation learning algorithm; The following experiments were performed: comparison of 14 dimension feature metrics and 7 dimension feature metrics, comparison of an ANN trained with and without cross-validation, comparison of different neural network architectures, comparison of prediction capability of neural network and linear regression, comparison of the prediction capability of neural network using 14 dimension feature metrics and linear regression using reject markers. The results show that neural network can outperform linear regression if properly trained, and that the new feature metrics provide improved predictive ability
Recommended from our members
Sequence Classification Restricted Boltzmann Machines With Gated Units
For the classification of sequential data, dynamic Bayesian networks and recurrent neural networks (RNNs) are the preferred models. While the former can explicitly model the temporal dependences between the variables, and the latter have the capability of learning representations. The recurrent temporal restricted Boltzmann machine (RTRBM) is a model that combines these two features. However, learning and inference in RTRBMs can be difficult because of the exponential nature of its gradient computations when maximizing log likelihoods. In this article, first, we address this intractability by optimizing a conditional rather than a joint probability distribution when performing sequence classification. This results in the ``sequence classification restricted Boltzmann machine'' (SCRBM). Second, we introduce gated SCRBMs (gSCRBMs), which use an information processing gate, as an integration of SCRBMs with long short-term memory (LSTM) models. In the experiments reported in this article, we evaluate the proposed models on optical character recognition, chunking, and multiresident activity recognition in smart homes. The experimental results show that gSCRBMs achieve the performance comparable to that of the state of the art in all three tasks. gSCRBMs require far fewer parameters in comparison with other recurrent networks with memory gates, in particular, LSTMs and gated recurrent units (GRUs)
Learning to Read by Spelling: Towards Unsupervised Text Recognition
This work presents a method for visual text recognition without using any
paired supervisory data. We formulate the text recognition task as one of
aligning the conditional distribution of strings predicted from given text
images, with lexically valid strings sampled from target corpora. This enables
fully automated, and unsupervised learning from just line-level text-images,
and unpaired text-string samples, obviating the need for large aligned
datasets. We present detailed analysis for various aspects of the proposed
method, namely - (1) impact of the length of training sequences on convergence,
(2) relation between character frequencies and the order in which they are
learnt, (3) generalisation ability of our recognition network to inputs of
arbitrary lengths, and (4) impact of varying the text corpus on recognition
accuracy. Finally, we demonstrate excellent text recognition accuracy on both
synthetically generated text images, and scanned images of real printed books,
using no labelled training examples
Applying Data Augmentation to Handwritten Arabic Numeral Recognition Using Deep Learning Neural Networks
Handwritten character recognition has been the center of research and a
benchmark problem in the sector of pattern recognition and artificial
intelligence, and it continues to be a challenging research topic. Due to its
enormous application many works have been done in this field focusing on
different languages. Arabic, being a diversified language has a huge scope of
research with potential challenges. A convolutional neural network model for
recognizing handwritten numerals in Arabic language is proposed in this paper,
where the dataset is subject to various augmentation in order to add robustness
needed for deep learning approach. The proposed method is empowered by the
presence of dropout regularization to do away with the problem of data
overfitting. Moreover, suitable change is introduced in activation function to
overcome the problem of vanishing gradient. With these modifications, the
proposed system achieves an accuracy of 99.4\% which performs better than every
previous work on the dataset.Comment: 5 pages, 6 figures, 3 table
Learning feed-forward one-shot learners
One-shot learning is usually tackled by using generative models or
discriminative embeddings. Discriminative methods based on deep learning, which
are very effective in other learning scenarios, are ill-suited for one-shot
learning as they need large amounts of training data. In this paper, we propose
a method to learn the parameters of a deep model in one shot. We construct the
learner as a second deep network, called a learnet, which predicts the
parameters of a pupil network from a single exemplar. In this manner we obtain
an efficient feed-forward one-shot learner, trained end-to-end by minimizing a
one-shot classification objective in a learning to learn formulation. In order
to make the construction feasible, we propose a number of factorizations of the
parameters of the pupil network. We demonstrate encouraging results by learning
characters from single exemplars in Omniglot, and by tracking visual objects
from a single initial exemplar in the Visual Object Tracking benchmark.Comment: The first three authors contributed equally, and are listed in
alphabetical orde
- …