4,167 research outputs found
Drawing and Recognizing Chinese Characters with Recurrent Neural Network
Recent deep learning based approaches have achieved great success on
handwriting recognition. Chinese characters are among the most widely adopted
writing systems in the world. Previous research has mainly focused on
recognizing handwritten Chinese characters. However, recognition is only one
aspect for understanding a language, another challenging and interesting task
is to teach a machine to automatically write (pictographic) Chinese characters.
In this paper, we propose a framework by using the recurrent neural network
(RNN) as both a discriminative model for recognizing Chinese characters and a
generative model for drawing (generating) Chinese characters. To recognize
Chinese characters, previous methods usually adopt the convolutional neural
network (CNN) models which require transforming the online handwriting
trajectory into image-like representations. Instead, our RNN based approach is
an end-to-end system which directly deals with the sequential structure and
does not require any domain-specific knowledge. With the RNN system (combining
an LSTM and GRU), state-of-the-art performance can be achieved on the
ICDAR-2013 competition database. Furthermore, under the RNN framework, a
conditional generative model with character embedding is proposed for
automatically drawing recognizable Chinese characters. The generated characters
(in vector format) are human-readable and also can be recognized by the
discriminative RNN model with high accuracy. Experimental results verify the
effectiveness of using RNNs as both generative and discriminative models for
the tasks of drawing and recognizing Chinese characters
Trajectory-based Radical Analysis Network for Online Handwritten Chinese Character Recognition
Recently, great progress has been made for online handwritten Chinese
character recognition due to the emergence of deep learning techniques.
However, previous research mostly treated each Chinese character as one class
without explicitly considering its inherent structure, namely the radical
components with complicated geometry. In this study, we propose a novel
trajectory-based radical analysis network (TRAN) to firstly identify radicals
and analyze two-dimensional structures among radicals simultaneously, then
recognize Chinese characters by generating captions of them based on the
analysis of their internal radicals. The proposed TRAN employs recurrent neural
networks (RNNs) as both an encoder and a decoder. The RNN encoder makes full
use of online information by directly transforming handwriting trajectory into
high-level features. The RNN decoder aims at generating the caption by
detecting radicals and spatial structures through an attention model. The
manner of treating a Chinese character as a two-dimensional composition of
radicals can reduce the size of vocabulary and enable TRAN to possess the
capability of recognizing unseen Chinese character classes, only if the
corresponding radicals have been seen. Evaluated on CASIA-OLHWDB database, the
proposed approach significantly outperforms the state-of-the-art
whole-character modeling approach with a relative character error rate (CER)
reduction of 10%. Meanwhile, for the case of recognition of 500 unseen Chinese
characters, TRAN can achieve a character accuracy of about 60% while the
traditional whole-character method has no capability to handle them
Handwritten Chinese Font Generation with Collaborative Stroke Refinement
Automatic character generation is an appealing solution for new typeface
design, especially for Chinese typefaces including over 3700 most commonly-used
characters. This task has two main pain points: (i) handwritten characters are
usually associated with thin strokes of few information and complex structure
which are error prone during deformation; (ii) thousands of characters with
various shapes are needed to synthesize based on a few manually designed
characters. To solve those issues, we propose a novel
convolutional-neural-network-based model with three main techniques:
collaborative stroke refinement, using collaborative training strategy to
recover the missing or broken strokes; online zoom-augmentation, taking the
advantage of the content-reuse phenomenon to reduce the size of training set;
and adaptive pre-deformation, standardizing and aligning the characters. The
proposed model needs only 750 paired training samples; no pre-trained network,
extra dataset resource or labels is needed. Experimental results show that the
proposed method significantly outperforms the state-of-the-art methods under
the practical restriction on handwritten font synthesis.Comment: 8 pages(exclude reference
End to End Recognition System for Recognizing Offline Unconstrained Vietnamese Handwriting
Inspired by recent successes in neural machine translation and image caption
generation, we present an attention based encoder decoder model (AED) to
recognize Vietnamese Handwritten Text. The model composes of two parts: a
DenseNet for extracting invariant features, and a Long Short-Term Memory
network (LSTM) with an attention model incorporated for generating output text
(LSTM decoder), which are connected from the CNN part to the attention model.
The input of the CNN part is a handwritten text image and the target of the
LSTM decoder is the corresponding text of the input image. Our model is trained
end-to-end to predict the text from a given input image since all the parts are
differential components. In the experiment section, we evaluate our proposed
AED model on the VNOnDB-Word and VNOnDB-Line datasets to verify its efficiency.
The experiential results show that our model achieves 12.30% of word error rate
without using any language model. This result is competitive with the
handwriting recognition system provided by Google in the Vietnamese Online
Handwritten Text Recognition competition
DenseRAN for Offline Handwritten Chinese Character Recognition
Recently, great success has been achieved in offline handwritten Chinese
character recognition by using deep learning methods. Chinese characters are
mainly logographic and consist of basic radicals, however, previous research
mostly treated each Chinese character as a whole without explicitly considering
its internal two-dimensional structure and radicals. In this study, we propose
a novel radical analysis network with densely connected architecture (DenseRAN)
to analyze Chinese character radicals and its two-dimensional structures
simultaneously. DenseRAN first encodes input image to high-level visual
features by employing DenseNet as an encoder. Then a decoder based on recurrent
neural networks is employed, aiming at generating captions of Chinese
characters by detecting radicals and two-dimensional structures through
attention mechanism. The manner of treating a Chinese character as a
composition of two-dimensional structures and radicals can reduce the size of
vocabulary and enable DenseRAN to possess the capability of recognizing unseen
Chinese character classes, only if the corresponding radicals have been seen in
training set. Evaluated on ICDAR-2013 competition database, the proposed
approach significantly outperforms whole-character modeling approach with a
relative character error rate (CER) reduction of 18.54%. Meanwhile, for the
case of recognizing 3277 unseen Chinese characters in CASIA-HWDB1.2 database,
DenseRAN can achieve a character accuracy of about 41% while the traditional
whole-character method has no capability to handle them.Comment: Accepted by ICFHR201
A New Hybrid-parameter Recurrent Neural Networks for Online Handwritten Chinese Character Recognition
The recurrent neural network (RNN) is appropriate for dealing with temporal
sequences. In this paper, we present a deep RNN with new features and apply it
for online handwritten Chinese character recognition. Compared with the
existing RNN models, three innovations are involved in the proposed system.
First, a new hidden layer function for RNN is proposed for learning temporal
information better. we call it Memory Pool Unit (MPU). The proposed MPU has a
simple architecture. Second, a new RNN architecture with hybrid parameter is
presented, in order to increasing the expression capacity of RNN. The proposed
hybrid-parameter RNN has parameter changes when calculating the iteration at
temporal dimension. Third, we make a adaptation that all the outputs of each
layer are stacked as the output of network. Stacked hidden layer states combine
all the hidden layer states for increasing the expression capacity. Experiments
are carried out on the IAHCC-UCAS2016 dataset and the CASIA-OLHWDB1.1 dataset.
The experimental results show that the hybrid-parameter RNN obtain a better
recognition performance with higher efficiency (fewer parameters and faster
speed). And the proposed Memory Pool Unit is proved to be a simple hidden layer
function and obtains a competitive recognition results
Few-shot Compositional Font Generation with Dual Memory
Generating a new font library is a very labor-intensive and time-consuming
job for glyph-rich scripts. Despite the remarkable success of existing font
generation methods, they have significant drawbacks; they require a large
number of reference images to generate a new font set, or they fail to capture
detailed styles with only a few samples. In this paper, we focus on
compositional scripts, a widely used letter system in the world, where each
glyph can be decomposed by several components. By utilizing the
compositionality of compositional scripts, we propose a novel font generation
framework, named Dual Memory-augmented Font Generation Network (DM-Font), which
enables us to generate a high-quality font library with only a few samples. We
employ memory components and global-context awareness in the generator to take
advantage of the compositionality. In the experiments on Korean-handwriting
fonts and Thai-printing fonts, we observe that our method generates a
significantly better quality of samples with faithful stylization compared to
the state-of-the-art generation methods quantitatively and qualitatively.
Source code is available at https://github.com/clovaai/dmfont.Comment: ECCV 2020 camera-read
Writer-Aware CNN for Parsimonious HMM-Based Offline Handwritten Chinese Text Recognition
Recently, the hybrid convolutional neural network hidden Markov model
(CNN-HMM) has been introduced for offline handwritten Chinese text recognition
(HCTR) and has achieved state-of-the-art performance. However, modeling each of
the large vocabulary of Chinese characters with a uniform and fixed number of
hidden states requires high memory and computational costs and makes the tens
of thousands of HMM state classes confusing. Another key issue of CNN-HMM for
HCTR is the diversified writing style, which leads to model strain and a
significant performance decline for specific writers. To address these issues,
we propose a writer-aware CNN based on parsimonious HMM (WCNN-PHMM). First,
PHMM is designed using a data-driven state-tying algorithm to greatly reduce
the total number of HMM states, which not only yields a compact CNN by state
sharing of the same or similar radicals among different Chinese characters but
also improves the recognition accuracy due to the more accurate modeling of
tied states and the lower confusion among them. Second, WCNN integrates each
convolutional layer with one adaptive layer fed by a writer-dependent vector,
namely, the writer code, to extract the irrelevant variability in writer
information to improve recognition performance. The parameters of
writer-adaptive layers are jointly optimized with other network parameters in
the training stage, while a multiple-pass decoding strategy is adopted to learn
the writer code and generate recognition results. Validated on the ICDAR 2013
competition of CASIA-HWDB database, the more compact WCNN-PHMM of a 7360-class
vocabulary can achieve a relative character error rate (CER) reduction of 16.6%
over the conventional CNN-HMM without considering language modeling. By
adopting a powerful hybrid language model (N-gram language model and recurrent
neural network language model), the CER of WCNN-PHMM is reduced to 3.17%
Adversarial Generation of Handwritten Text Images Conditioned on Sequences
State-of-the-art offline handwriting text recognition systems tend to use
neural networks and therefore require a large amount of annotated data to be
trained. In order to partially satisfy this requirement, we propose a system
based on Generative Adversarial Networks (GAN) to produce synthetic images of
handwritten words. We use bidirectional LSTM recurrent layers to get an
embedding of the word to be rendered, and we feed it to the generator network.
We also modify the standard GAN by adding an auxiliary network for text
recognition. The system is then trained with a balanced combination of an
adversarial loss and a CTC loss. Together, these extensions to GAN enable to
control the textual content of the generated word images. We obtain realistic
images on both French and Arabic datasets, and we show that integrating these
synthetic images into the existing training data of a text recognition system
can slightly enhance its performance
Handwritten Bangla Basic and Compound character recognition using MLP and SVM classifier
A novel approach for recognition of handwritten compound Bangla characters,
along with the Basic characters of Bangla alphabet, is presented here. Compared
to English like Roman script, one of the major stumbling blocks in Optical
Character Recognition (OCR) of handwritten Bangla script is the large number of
complex shaped character classes of Bangla alphabet. In addition to 50 basic
character classes, there are nearly 160 complex shaped compound character
classes in Bangla alphabet. Dealing with such a large varieties of handwritten
characters with a suitably designed feature set is a challenging problem.
Uncertainty and imprecision are inherent in handwritten script. Moreover, such
a large varieties of complex shaped characters, some of which have close
resemblance, makes the problem of OCR of handwritten Bangla characters more
difficult. Considering the complexity of the problem, the present approach
makes an attempt to identify compound character classes from most frequently to
less frequently occurred ones, i.e., in order of importance. This is to develop
a frame work for incrementally increasing the number of learned classes of
compound characters from more frequently occurred ones to less frequently
occurred ones along with Basic characters. On experimentation, the technique is
observed produce an average recognition rate of 79.25 after three fold cross
validation of data with future scope of improvement and extension
- …