902 research outputs found
A Scalable Handwritten Text Recognition System
Many studies on (Offline) Handwritten Text Recognition (HTR) systems have
focused on building state-of-the-art models for line recognition on small
corpora. However, adding HTR capability to a large scale multilingual OCR
system poses new challenges. This paper addresses three problems in building
such systems: data, efficiency, and integration. Firstly, one of the biggest
challenges is obtaining sufficient amounts of high quality training data. We
address the problem by using online handwriting data collected for a large
scale production online handwriting recognition system. We describe our image
data generation pipeline and study how online data can be used to build HTR
models. We show that the data improve the models significantly under the
condition where only a small number of real images is available, which is
usually the case for HTR models. It enables us to support a new script at
substantially lower cost. Secondly, we propose a line recognition model based
on neural networks without recurrent connections. The model achieves a
comparable accuracy with LSTM-based models while allowing for better
parallelism in training and inference. Finally, we present a simple way to
integrate HTR models into an OCR system. These constitute a solution to bring
HTR capability into a large scale OCR system.Comment: ICDAR 201
Recurrent Neural Network Method in Arabic Words Recognition System
The recognition of unconstrained handwriting continues to be a difficult task
for computers despite active research for several decades. This is because
handwritten text offers great challenges such as character and word
segmentation, character recognition, variation between handwriting styles,
different character size and no font constraints as well as the background
clarity. In this paper primarily discussed Online Handwriting Recognition
methods for Arabic words which being often used among then across the Middle
East and North Africa people. Because of the characteristic of the whole body
of the Arabic words, namely connectivity between the characters, thereby the
segmentation of An Arabic word is very difficult. We introduced a recurrent
neural network to online handwriting Arabic word recognition. The key
innovation is a recently produce recurrent neural networks objective function
known as connectionist temporal classification. The system consists of an
advanced recurrent neural network with an output layer designed for sequence
labeling, partially combined with a probabilistic language model. Experimental
results show that unconstrained Arabic words achieve recognition rates about
79%, which is significantly higher than the about 70% using a previously
developed hidden markov model based recognition system.Comment: 6 Pages, 5 Figures, Vol. 3, Issue 11, pages 43-4
A Computationally Efficient Pipeline Approach to Full Page Offline Handwritten Text Recognition
Offline handwriting recognition with deep neural networks is usually limited
to words or lines due to large computational costs. In this paper, a less
computationally expensive full page offline handwritten text recognition
framework is introduced. This framework includes a pipeline that locates
handwritten text with an object detection neural network and recognises the
text within the detected regions using features extracted with a multi-scale
convolutional neural network (CNN) fed into a bidirectional long short term
memory (LSTM) network. This framework achieves comparable error rates to state
of the art frameworks while using less memory and time. The results in this
paper demonstrate the potential of this framework and future work can
investigate production ready and deployable handwritten text recognisers
Generating Sequences With Recurrent Neural Networks
This paper shows how Long Short-term Memory recurrent neural networks can be
used to generate complex sequences with long-range structure, simply by
predicting one data point at a time. The approach is demonstrated for text
(where the data are discrete) and online handwriting (where the data are
real-valued). It is then extended to handwriting synthesis by allowing the
network to condition its predictions on a text sequence. The resulting system
is able to generate highly realistic cursive handwriting in a wide variety of
styles.Comment: Thanks to Peng Liu and Sergey Zyrianov for various correction
Indic Handwritten Script Identification using Offline-Online Multimodal Deep Network
In this paper, we propose a novel approach of word-level Indic script
identification using only character-level data in training stage. The
advantages of using character level data for training have been outlined in
section I. Our method uses a multimodal deep network which takes both offline
and online modality of the data as input in order to explore the information
from both the modalities jointly for script identification task. We take
handwritten data in either modality as input and the opposite modality is
generated through intermodality conversion. Thereafter, we feed this
offline-online modality pair to our network. Hence, along with the advantage of
utilizing information from both the modalities, it can work as a single
framework for both offline and online script identification simultaneously
which alleviates the need for designing two separate script identification
modules for individual modality. One more major contribution is that we propose
a novel conditional multimodal fusion scheme to combine the information from
offline and online modality which takes into account the real origin of the
data being fed to our network and thus it combines adaptively. An exhaustive
experiment has been done on a data set consisting of English and six Indic
scripts. Our proposed framework clearly outperforms different frameworks based
on traditional classifiers along with handcrafted features and deep learning
based methods with a clear margin. Extensive experiments show that using only
character level training data can achieve state-of-art performance similar to
that obtained with traditional training using word level data in our framework.Comment: Accepted in Information Fusion, Elsevie
A Study on Effects of Implicit and Explicit Language Model Information for DBLSTM-CTC Based Handwriting Recognition
Deep Bidirectional Long Short-Term Memory (D-BLSTM) with a Connectionist
Temporal Classification (CTC) output layer has been established as one of the
state-of-the-art solutions for handwriting recognition. It is well known that
the DBLSTM trained by using a CTC objective function will learn both local
character image dependency for character modeling and long-range contextual
dependency for implicit language modeling. In this paper, we study the effects
of implicit and explicit language model information for DBLSTM-CTC based
handwriting recognition by comparing the performance of using or without using
an explicit language model in decoding. It is observed that even using one
million lines of training sentences to train the DBLSTM, using an explicit
language model is still helpful. To deal with such a large-scale training
problem, a GPU-based training tool has been developed for CTC training of
DBLSTM by using a mini-batch based epochwise Back Propagation Through Time
(BPTT) algorithm.Comment: Accepted by ICDAR-201
Deep Learning: Our Miraculous Year 1990-1991
In 2020, we will celebrate that many of the basic ideas behind the deep
learning revolution were published three decades ago within fewer than 12
months in our "Annus Mirabilis" or "Miraculous Year" 1990-1991 at TU Munich.
Back then, few people were interested, but a quarter century later, neural
networks based on these ideas were on over 3 billion devices such as
smartphones, and used many billions of times per day, consuming a significant
fraction of the world's compute.Comment: 37 pages, 188 references, based on work of 4 Oct 201
On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models
This paper addresses the general problem of reinforcement learning (RL) in
partially observable environments. In 2013, our large RL recurrent neural
networks (RNNs) learned from scratch to drive simulated cars from
high-dimensional video input. However, real brains are more powerful in many
ways. In particular, they learn a predictive model of their initially unknown
environment, and somehow use it for abstract (e.g., hierarchical) planning and
reasoning. Guided by algorithmic information theory, we describe RNN-based AIs
(RNNAIs) designed to do the same. Such an RNNAI can be trained on never-ending
sequences of tasks, some of them provided by the user, others invented by the
RNNAI itself in a curious, playful fashion, to improve its RNN-based world
model. Unlike our previous model-building RNN-based RL machines dating back to
1990, the RNNAI learns to actively query its model for abstract reasoning and
planning and decision making, essentially "learning to think." The basic ideas
of this report can be applied to many other cases where one RNN-like system
exploits the algorithmic information content of another. They are taken from a
grant proposal submitted in Fall 2014, and also explain concepts such as
"mirror neurons." Experimental results will be described in separate papers.Comment: 36 pages, 1 figure. arXiv admin note: substantial text overlap with
arXiv:1404.782
Building Fast and Compact Convolutional Neural Networks for Offline Handwritten Chinese Character Recognition
Like other problems in computer vision, offline handwritten Chinese character
recognition (HCCR) has achieved impressive results using convolutional neural
network (CNN)-based methods. However, larger and deeper networks are needed to
deliver state-of-the-art results in this domain. Such networks intuitively
appear to incur high computational cost, and require the storage of a large
number of parameters, which renders them unfeasible for deployment in portable
devices. To solve this problem, we propose a Global Supervised Low-rank
Expansion (GSLRE) method and an Adaptive Drop-weight (ADW) technique to solve
the problems of speed and storage capacity. We design a nine-layer CNN for HCCR
consisting of 3,755 classes, and devise an algorithm that can reduce the
networks computational cost by nine times and compress the network to 1/18 of
the original size of the baseline model, with only a 0.21% drop in accuracy. In
tests, the proposed algorithm surpassed the best single-network performance
reported thus far in the literature while requiring only 2.3 MB for storage.
Furthermore, when integrated with our effective forward implementation, the
recognition of an offline character image took only 9.7 ms on a CPU. Compared
with the state-of-the-art CNN model for HCCR, our approach is approximately 30
times faster, yet 10 times more cost efficient.Comment: 15 pages, 7 figures, 5 table
Writer-Aware CNN for Parsimonious HMM-Based Offline Handwritten Chinese Text Recognition
Recently, the hybrid convolutional neural network hidden Markov model
(CNN-HMM) has been introduced for offline handwritten Chinese text recognition
(HCTR) and has achieved state-of-the-art performance. However, modeling each of
the large vocabulary of Chinese characters with a uniform and fixed number of
hidden states requires high memory and computational costs and makes the tens
of thousands of HMM state classes confusing. Another key issue of CNN-HMM for
HCTR is the diversified writing style, which leads to model strain and a
significant performance decline for specific writers. To address these issues,
we propose a writer-aware CNN based on parsimonious HMM (WCNN-PHMM). First,
PHMM is designed using a data-driven state-tying algorithm to greatly reduce
the total number of HMM states, which not only yields a compact CNN by state
sharing of the same or similar radicals among different Chinese characters but
also improves the recognition accuracy due to the more accurate modeling of
tied states and the lower confusion among them. Second, WCNN integrates each
convolutional layer with one adaptive layer fed by a writer-dependent vector,
namely, the writer code, to extract the irrelevant variability in writer
information to improve recognition performance. The parameters of
writer-adaptive layers are jointly optimized with other network parameters in
the training stage, while a multiple-pass decoding strategy is adopted to learn
the writer code and generate recognition results. Validated on the ICDAR 2013
competition of CASIA-HWDB database, the more compact WCNN-PHMM of a 7360-class
vocabulary can achieve a relative character error rate (CER) reduction of 16.6%
over the conventional CNN-HMM without considering language modeling. By
adopting a powerful hybrid language model (N-gram language model and recurrent
neural network language model), the CER of WCNN-PHMM is reduced to 3.17%
- …