4 research outputs found
End to End Recognition System for Recognizing Offline Unconstrained Vietnamese Handwriting
Inspired by recent successes in neural machine translation and image caption
generation, we present an attention based encoder decoder model (AED) to
recognize Vietnamese Handwritten Text. The model composes of two parts: a
DenseNet for extracting invariant features, and a Long Short-Term Memory
network (LSTM) with an attention model incorporated for generating output text
(LSTM decoder), which are connected from the CNN part to the attention model.
The input of the CNN part is a handwritten text image and the target of the
LSTM decoder is the corresponding text of the input image. Our model is trained
end-to-end to predict the text from a given input image since all the parts are
differential components. In the experiment section, we evaluate our proposed
AED model on the VNOnDB-Word and VNOnDB-Line datasets to verify its efficiency.
The experiential results show that our model achieves 12.30% of word error rate
without using any language model. This result is competitive with the
handwriting recognition system provided by Google in the Vietnamese Online
Handwritten Text Recognition competition
Improving Attention-Based Handwritten Mathematical Expression Recognition with Scale Augmentation and Drop Attention
Handwritten mathematical expression recognition (HMER) is an important
research direction in handwriting recognition. The performance of HMER suffers
from the two-dimensional structure of mathematical expressions (MEs). To
address this issue, in this paper, we propose a high-performance HMER model
with scale augmentation and drop attention. Specifically, tackling ME with
unstable scale in both horizontal and vertical directions, scale augmentation
improves the performance of the model on MEs of various scales. An
attention-based encoder-decoder network is used for extracting features and
generating predictions. In addition, drop attention is proposed to further
improve performance when the attention distribution of the decoder is not
precise. Compared with previous methods, our method achieves state-of-the-art
performance on two public datasets of CROHME 2014 and CROHME 2016.Comment: Accepted to appear in ICFHR 202
ConvMath: A Convolutional Sequence Network for Mathematical Expression Recognition
Despite the recent advances in optical character recognition (OCR),
mathematical expressions still face a great challenge to recognize due to their
two-dimensional graphical layout. In this paper, we propose a convolutional
sequence modeling network, ConvMath, which converts the mathematical expression
description in an image into a LaTeX sequence in an end-to-end way. The network
combines an image encoder for feature extraction and a convolutional decoder
for sequence generation. Compared with other Long Short Term Memory(LSTM) based
encoder-decoder models, ConvMath is entirely based on convolution, thus it is
easy to perform parallel computation. Besides, the network adopts multi-layer
attention mechanism in the decoder, which allows the model to align output
symbols with source feature vectors automatically, and alleviates the problem
of lacking coverage while training the model. The performance of ConvMath is
evaluated on an open dataset named IM2LATEX-100K, including 103556 samples. The
experimental results demonstrate that the proposed network achieves
state-of-the-art accuracy and much better efficiency than previous methods.Comment: Accepted in ICPR202
Pattern Generation Strategies for Improving Recognition of Handwritten Mathematical Expressions
Recognition of Handwritten Mathematical Expressions (HMEs) is a challenging
problem because of the ambiguity and complexity of two-dimensional handwriting.
Moreover, the lack of large training data is a serious issue, especially for
academic recognition systems. In this paper, we propose pattern generation
strategies that generate shape and structural variations to improve the
performance of recognition systems based on a small training set. For data
generation, we employ the public databases: CROHME 2014 and 2016 of online
HMEs. The first strategy employs local and global distortions to generate shape
variations. The second strategy decomposes an online HME into sub-online HMEs
to get more structural variations. The hybrid strategy combines both these
strategies to maximize shape and structural variations. The generated online
HMEs are converted to images for offline HME recognition. We tested our
strategies in an end-to-end recognition system constructed from a recent deep
learning model: Convolutional Neural Network and attention-based
encoder-decoder. The results of experiments on the CROHME 2014 and 2016
databases demonstrate the superiority and effectiveness of our strategies: our
hybrid strategy achieved classification rates of 48.78% and 45.60%,
respectively, on these databases. These results are competitive compared to
others reported in recent literature. Our generated datasets are openly
available for research community and constitute a useful resource for the HME
recognition research in future