3 research outputs found
Offline Handwritten Chinese Text Recognition with Convolutional Neural Networks
Deep learning based methods have been dominating the text recognition tasks
in different and multilingual scenarios. The offline handwritten Chinese text
recognition (HCTR) is one of the most challenging tasks because it involves
thousands of characters, variant writing styles and complex data collection
process. Recently, the recurrent-free architectures for text recognition
appears to be competitive as its highly parallelism and comparable results. In
this paper, we build the models using only the convolutional neural networks
and use CTC as the loss function. To reduce the overfitting, we apply dropout
after each max-pooling layer and with extreme high rate on the last one before
the linear layer. The CASIA-HWDB database is selected to tune and evaluate the
proposed models. With the existing text samples as templates, we randomly
choose isolated character samples to synthesis more text samples for training.
We finally achieve 6.81% character error rate (CER) on the ICDAR 2013
competition set, which is the best published result without language model
correction.Comment: 6 pages, 5 figures, and 3 table
Writer-Aware CNN for Parsimonious HMM-Based Offline Handwritten Chinese Text Recognition
Recently, the hybrid convolutional neural network hidden Markov model
(CNN-HMM) has been introduced for offline handwritten Chinese text recognition
(HCTR) and has achieved state-of-the-art performance. However, modeling each of
the large vocabulary of Chinese characters with a uniform and fixed number of
hidden states requires high memory and computational costs and makes the tens
of thousands of HMM state classes confusing. Another key issue of CNN-HMM for
HCTR is the diversified writing style, which leads to model strain and a
significant performance decline for specific writers. To address these issues,
we propose a writer-aware CNN based on parsimonious HMM (WCNN-PHMM). First,
PHMM is designed using a data-driven state-tying algorithm to greatly reduce
the total number of HMM states, which not only yields a compact CNN by state
sharing of the same or similar radicals among different Chinese characters but
also improves the recognition accuracy due to the more accurate modeling of
tied states and the lower confusion among them. Second, WCNN integrates each
convolutional layer with one adaptive layer fed by a writer-dependent vector,
namely, the writer code, to extract the irrelevant variability in writer
information to improve recognition performance. The parameters of
writer-adaptive layers are jointly optimized with other network parameters in
the training stage, while a multiple-pass decoding strategy is adopted to learn
the writer code and generate recognition results. Validated on the ICDAR 2013
competition of CASIA-HWDB database, the more compact WCNN-PHMM of a 7360-class
vocabulary can achieve a relative character error rate (CER) reduction of 16.6%
over the conventional CNN-HMM without considering language modeling. By
adopting a powerful hybrid language model (N-gram language model and recurrent
neural network language model), the CER of WCNN-PHMM is reduced to 3.17%
Joint Architecture and Knowledge Distillation in CNN for Chinese Text Recognition
The technique of distillation helps transform cumbersome neural network into
compact network so that the model can be deployed on alternative hardware
devices. The main advantages of distillation based approaches include simple
training process, supported by most off-the-shelf deep learning softwares and
no special requirement of hardwares. In this paper, we propose a guideline to
distill the architecture and knowledge of pre-trained standard CNNs
simultaneously. We first make a quantitative analysis of the baseline network,
including computational cost and storage overhead in different components. And
then, according to the analysis results, optional strategies can be adopted to
the compression of fully-connected layers. For vanilla convolution layers, the
proposed parsimonious convolution (ParConv) block only consisting of depthwise
separable convolution and pointwise convolution is used as a direct replacement
without other adjustments such as the widths and depths in the network.
Finally, the knowledge distillation with multiple losses is adopted to improve
performance of the compact CNN. The proposed algorithm is first verified on
offline handwritten Chinese text recognition (HCTR) where the CNNs are
characterized by tens of thousands of output nodes and trained by hundreds of
millions of training samples. Compared with the CNN in the state-of-the-art
system, our proposed joint architecture and knowledge distillation can reduce
the computational cost by >10x and model size by >8x with negligible accuracy
loss. And then, by conducting experiments on one of the most popular data sets:
MNIST, we demonstrate the proposed approach can also be successfully applied on
mainstream backbone networks