10,993 research outputs found
Design of a Very Compact CNN Classifier for Online Handwritten Chinese Character Recognition Using DropWeight and Global Pooling
Currently, owing to the ubiquity of mobile devices, online handwritten
Chinese character recognition (HCCR) has become one of the suitable choice for
feeding input to cell phones and tablet devices. Over the past few years,
larger and deeper convolutional neural networks (CNNs) have extensively been
employed for improving character recognition performance. However, its
substantial storage requirement is a significant obstacle in deploying such
networks into portable electronic devices. To circumvent this problem, we
propose a novel technique called DropWeight for pruning redundant connections
in the CNN architecture. It is revealed that the proposed method not only
treats streamlined architectures such as AlexNet and VGGNet well but also
exhibits remarkable performance for deep residual network and inception
network. We also demonstrate that global pooling is a better choice for
building very compact online HCCR systems. Experiments were performed on the
ICDAR-2013 online HCCR competition dataset using our proposed network, and it
is found that the proposed approach requires only 0.57 MB for storage, whereas
state-of-the-art CNN-based methods require up to 135 MB; meanwhile the
performance is decreased only by 0.91%.Comment: 5 pages, 2 figures, 2 table
Are 2D-LSTM really dead for offline text recognition?
There is a recent trend in handwritten text recognition with deep neural
networks to replace 2D recurrent layers with 1D, and in some cases even
completely remove the recurrent layers, relying on simple feed-forward
convolutional only architectures. The most used type of recurrent layer is the
Long-Short Term Memory (LSTM). The motivations to do so are many: there are few
open-source implementations of 2D-LSTM, even fewer supporting GPU
implementations (currently cuDNN only implements 1D-LSTM); 2D recurrences
reduce the amount of computations that can be parallelized, and thus possibly
increase the training/inference time; recurrences create global dependencies
with respect to the input, and sometimes this may not be desirable.
Many recent competitions were won by systems that employed networks that use
2D-LSTM layers. Most previous work that compared 1D or pure feed-forward
architectures to 2D recurrent models have done so on simple datasets or did not
fully optimize the "baseline" 2D model compared to the challenger model, which
was dully optimized.
In this work, we aim at a fair comparison between 2D and competing models and
also extensively evaluate them on more complex datasets that are more
representative of challenging "real-world" data, compared to "academic"
datasets that are more restricted in their complexity. We aim at determining
when and why the 1D and 2D recurrent models have different results. We also
compare the results with a language model to assess if linguistic constraints
do level the performance of the different networks.
Our results show that for challenging datasets, 2D-LSTM networks still seem
to provide the highest performances and we propose a visualization strategy to
explain it.Comment: 12 pages, 4 figure
Adversarial Generation of Training Examples: Applications to Moving Vehicle License Plate Recognition
Generative Adversarial Networks (GAN) have attracted much research attention
recently, leading to impressive results for natural image generation. However,
to date little success was observed in using GAN generated images for improving
classification tasks. Here we attempt to explore, in the context of car license
plate recognition, whether it is possible to generate synthetic training data
using GAN to improve recognition accuracy. With a carefully-designed pipeline,
we show that the answer is affirmative. First, a large-scale image set is
generated using the generator of GAN, without manual annotation. Then, these
images are fed to a deep convolutional neural network (DCNN) followed by a
bidirectional recurrent neural network (BRNN) with long short-term memory
(LSTM), which performs the feature learning and sequence labelling. Finally,
the pre-trained model is fine-tuned on real images. Our experimental results on
a few data sets demonstrate the effectiveness of using GAN images: an
improvement of 7.5% over a strong baseline with moderate-sized real data being
available. We show that the proposed framework achieves competitive recognition
accuracy on challenging test datasets. We also leverage the depthwise separate
convolution to construct a lightweight convolutional RNN, which is about half
size and 2x faster on CPU. Combining this framework and the proposed pipeline,
we make progress in performing accurate recognition on mobile and embedded
devices
Handwritten Bangla Character Recognition Using The State-of-Art Deep Convolutional Neural Networks
In spite of advances in object recognition technology, Handwritten Bangla
Character Recognition (HBCR) remains largely unsolved due to the presence of
many ambiguous handwritten characters and excessively cursive Bangla
handwritings. Even the best existing recognizers do not lead to satisfactory
performance for practical applications related to Bangla character recognition
and have much lower performance than those developed for English alpha-numeric
characters. To improve the performance of HBCR, we herein present the
application of the state-of-the-art Deep Convolutional Neural Networks (DCNN)
including VGG Network, All Convolution Network (All-Conv Net), Network in
Network (NiN), Residual Network, FractalNet, and DenseNet for HBCR. The deep
learning approaches have the advantage of extracting and using feature
information, improving the recognition of 2D shapes with a high degree of
invariance to translation, scaling and other distortions. We systematically
evaluated the performance of DCNN models on publicly available Bangla
handwritten character dataset called CMATERdb and achieved the superior
recognition accuracy when using DCNN models. This improvement would help in
building an automatic HBCR system for practical applications.Comment: 12 pages,22 figures, 5 tables. arXiv admin note: text overlap with
arXiv:1705.0268
Handwritten Isolated Bangla Compound Character Recognition: a new benchmark using a novel deep learning approach
In this work, a novel deep learning technique for the recognition of
handwritten Bangla isolated compound character is presented and a new benchmark
of recognition accuracy on the CMATERdb 3.1.3.3 dataset is reported. Greedy
layer wise training of Deep Neural Network has helped to make significant
strides in various pattern recognition problems. We employ layerwise training
to Deep Convolutional Neural Networks (DCNN) in a supervised fashion and
augment the training process with the RMSProp algorithm to achieve faster
convergence. We compare results with those obtained from standard shallow
learning methods with predefined features, as well as standard DCNNs.
Supervised layerwise trained DCNNs are found to outperform standard shallow
learning models such as Support Vector Machines as well as regular DCNNs of
similar architecture by achieving error rate of 9.67% thereby setting a new
benchmark on the CMATERdb 3.1.3.3 with recognition accuracy of 90.33%,
representing an improvement of nearly 10%
Bangla License Plate Recognition Using Convolutional Neural Networks (CNN)
In the last few years, the deep learning technique in particular
Convolutional Neural Networks (CNNs) is using massively in the field of
computer vision and machine learning. This deep learning technique provides
state-of-the-art accuracy in different classification, segmentation, and
detection tasks on different benchmarks such as MNIST, CIFAR-10, CIFAR-100,
Microsoft COCO, and ImageNet. However, there are a lot of research has been
conducted for Bangla License plate recognition with traditional machine
learning approaches in last decade. None of them are used to deploy a physical
system for Bangla License Plate Recognition System (BLPRS) due to their poor
recognition accuracy. In this paper, we have implemented CNNs based Bangla
license plate recognition system with better accuracy that can be applied for
different purposes including roadside assistance, automatic parking lot
management system, vehicle license status detection and so on. Along with that,
we have also created and released a very first and standard database for BLPRS.Comment: 6 pages,10 figure
Speaker Diarization using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings
In this paper we propose a new method of speaker diarization that employs a
deep learning architecture to learn speaker embeddings. In contrast to the
traditional approaches that build their speaker embeddings using manually
hand-crafted spectral features, we propose to train for this purpose a
recurrent convolutional neural network applied directly on magnitude
spectrograms. To compare our approach with the state of the art, we collect and
release for the public an additional dataset of over 6 hours of fully annotated
broadcast material. The results of our evaluation on the new dataset and three
other benchmark datasets show that our proposed method significantly
outperforms the competitors and reduces diarization error rate by a large
margin of over 30% with respect to the baseline
Telugu OCR Framework using Deep Learning
In this paper, we address the task of Optical Character Recognition(OCR) for
the Telugu script. We present an end-to-end framework that segments the text
image, classifies the characters and extracts lines using a language model. The
segmentation is based on mathematical morphology. The classification module,
which is the most challenging task of the three, is a deep convolutional neural
network. The language is modelled as a third degree markov chain at the glyph
level. Telugu script is a complex alphasyllabary and the language is
agglutinative, making the problem hard. In this paper we apply the latest
advances in neural networks to achieve state-of-the-art error rates. We also
review convolutional neural networks in great detail and expound the
statistical justification behind the many tricks needed to make Deep Learning
work
A Survey of the Recent Architectures of Deep Convolutional Neural Networks
Deep Convolutional Neural Network (CNN) is a special type of Neural Networks,
which has shown exemplary performance on several competitions related to
Computer Vision and Image Processing. Some of the exciting application areas of
CNN include Image Classification and Segmentation, Object Detection, Video
Processing, Natural Language Processing, and Speech Recognition. The powerful
learning ability of deep CNN is primarily due to the use of multiple feature
extraction stages that can automatically learn representations from the data.
The availability of a large amount of data and improvement in the hardware
technology has accelerated the research in CNNs, and recently interesting deep
CNN architectures have been reported. Several inspiring ideas to bring
advancements in CNNs have been explored, such as the use of different
activation and loss functions, parameter optimization, regularization, and
architectural innovations. However, the significant improvement in the
representational capacity of the deep CNN is achieved through architectural
innovations. Notably, the ideas of exploiting spatial and channel information,
depth and width of architecture, and multi-path information processing have
gained substantial attention. Similarly, the idea of using a block of layers as
a structural unit is also gaining popularity. This survey thus focuses on the
intrinsic taxonomy present in the recently reported deep CNN architectures and,
consequently, classifies the recent innovations in CNN architectures into seven
different categories. These seven categories are based on spatial exploitation,
depth, multi-path, width, feature-map exploitation, channel boosting, and
attention. Additionally, the elementary understanding of CNN components,
current challenges, and applications of CNN are also provided.Comment: Number of Pages: 70, Number of Figures: 11, Number of Tables: 11.
Artif Intell Rev (2020
Handwritten Bangla Digit Recognition Using Deep Learning
In spite of the advances in pattern recognition technology, Handwritten
Bangla Character Recognition (HBCR) (such as alpha-numeric and special
characters) remains largely unsolved due to the presence of many perplexing
characters and excessive cursive in Bangla handwriting. Even the best existing
recognizers do not lead to satisfactory performance for practical applications.
To improve the performance of Handwritten Bangla Digit Recognition (HBDR), we
herein present a new approach based on deep neural networks which have recently
shown excellent performance in many pattern recognition and machine learning
applications, but has not been throughly attempted for HBDR. We introduce
Bangla digit recognition techniques based on Deep Belief Network (DBN),
Convolutional Neural Networks (CNN), CNN with dropout, CNN with dropout and
Gaussian filters, and CNN with dropout and Gabor filters. These networks have
the advantage of extracting and using feature information, improving the
recognition of two dimensional shapes with a high degree of invariance to
translation, scaling and other pattern distortions. We systematically evaluated
the performance of our method on publicly available Bangla numeral image
database named CMATERdb 3.1.1. From experiments, we achieved 98.78% recognition
rate using the proposed method: CNN with Gabor features and dropout, which
outperforms the state-of-the-art algorithms for HDBR.Comment: 12 pages, 10 figures, 3 table
- …