3,394 research outputs found
Recommended from our members
Fine-Grain Segmentation of the Intervertebral Discs from MR Spine Images Using Deep Convolutional Neural Networks: BSU-Net.
We propose a new deep learning network capable of successfully segmenting intervertebral discs and their complex boundaries from magnetic resonance (MR) spine images. The existing U-network (U-net) is known to perform well in various segmentation tasks in medical images; however, its performance with respect to details of segmentation such as boundaries is limited by the structural limitations of a max-pooling layer that plays a key role in feature extraction process in the U-net. We designed a modified convolutional and pooling layer scheme and applied a cascaded learning method to overcome these structural limitations of the max-pooling layer of a conventional U-net. The proposed network achieved 3% higher Dice similarity coefficient (DSC) than conventional U-net for intervertebral disc segmentation (89.44% vs. 86.44%, respectively; p < 0.001). For intervertebral disc boundary segmentation, the proposed network achieved 10.46% higher DSC than conventional U-net (54.62% vs. 44.16%, respectively; p < 0.001)
Handwritten Bangla Character Recognition Using The State-of-Art Deep Convolutional Neural Networks
In spite of advances in object recognition technology, Handwritten Bangla
Character Recognition (HBCR) remains largely unsolved due to the presence of
many ambiguous handwritten characters and excessively cursive Bangla
handwritings. Even the best existing recognizers do not lead to satisfactory
performance for practical applications related to Bangla character recognition
and have much lower performance than those developed for English alpha-numeric
characters. To improve the performance of HBCR, we herein present the
application of the state-of-the-art Deep Convolutional Neural Networks (DCNN)
including VGG Network, All Convolution Network (All-Conv Net), Network in
Network (NiN), Residual Network, FractalNet, and DenseNet for HBCR. The deep
learning approaches have the advantage of extracting and using feature
information, improving the recognition of 2D shapes with a high degree of
invariance to translation, scaling and other distortions. We systematically
evaluated the performance of DCNN models on publicly available Bangla
handwritten character dataset called CMATERdb and achieved the superior
recognition accuracy when using DCNN models. This improvement would help in
building an automatic HBCR system for practical applications.Comment: 12 pages,22 figures, 5 tables. arXiv admin note: text overlap with
arXiv:1705.0268
Telugu OCR Framework using Deep Learning
In this paper, we address the task of Optical Character Recognition(OCR) for
the Telugu script. We present an end-to-end framework that segments the text
image, classifies the characters and extracts lines using a language model. The
segmentation is based on mathematical morphology. The classification module,
which is the most challenging task of the three, is a deep convolutional neural
network. The language is modelled as a third degree markov chain at the glyph
level. Telugu script is a complex alphasyllabary and the language is
agglutinative, making the problem hard. In this paper we apply the latest
advances in neural networks to achieve state-of-the-art error rates. We also
review convolutional neural networks in great detail and expound the
statistical justification behind the many tricks needed to make Deep Learning
work
Indic Handwritten Script Identification using Offline-Online Multimodal Deep Network
In this paper, we propose a novel approach of word-level Indic script
identification using only character-level data in training stage. The
advantages of using character level data for training have been outlined in
section I. Our method uses a multimodal deep network which takes both offline
and online modality of the data as input in order to explore the information
from both the modalities jointly for script identification task. We take
handwritten data in either modality as input and the opposite modality is
generated through intermodality conversion. Thereafter, we feed this
offline-online modality pair to our network. Hence, along with the advantage of
utilizing information from both the modalities, it can work as a single
framework for both offline and online script identification simultaneously
which alleviates the need for designing two separate script identification
modules for individual modality. One more major contribution is that we propose
a novel conditional multimodal fusion scheme to combine the information from
offline and online modality which takes into account the real origin of the
data being fed to our network and thus it combines adaptively. An exhaustive
experiment has been done on a data set consisting of English and six Indic
scripts. Our proposed framework clearly outperforms different frameworks based
on traditional classifiers along with handcrafted features and deep learning
based methods with a clear margin. Extensive experiments show that using only
character level training data can achieve state-of-art performance similar to
that obtained with traditional training using word level data in our framework.Comment: Accepted in Information Fusion, Elsevie
Large Scale Font Independent Urdu Text Recognition System
OCR algorithms have received a significant improvement in performance
recently, mainly due to the increase in the capabilities of artificial
intelligence algorithms. However, this advancement is not evenly distributed
over all languages. Urdu is among the languages which did not receive much
attention, especially in the font independent perspective. There exists no
automated system that can reliably recognize printed Urdu text in images and
videos across different fonts. To help bridge this gap, we have developed
Qaida, a large scale data set with 256 fonts, and a complete Urdu lexicon. We
have also developed a Convolutional Neural Network (CNN) based classification
model which can recognize Urdu ligatures with 84.2% accuracy. Moreover, we
demonstrate that our recognition network can not only recognize the text in the
fonts it is trained on but can also reliably recognize text in unseen (new)
fonts. To this end, this paper makes following contributions: (i) we introduce
a large scale, multiple fonts based data set for printed Urdu text
recognition;(ii) we have designed, trained and evaluated a CNN based model for
Urdu text recognition; (iii) we experiment with incremental learning methods to
produce state-of-the-art results for Urdu text recognition. All the experiment
choices were thoroughly validated via detailed empirical analysis. We believe
that this study can serve as the basis for further improvement in the
performance of font independent Urdu OCR systems
Handwritten Bangla Digit Recognition Using Deep Learning
In spite of the advances in pattern recognition technology, Handwritten
Bangla Character Recognition (HBCR) (such as alpha-numeric and special
characters) remains largely unsolved due to the presence of many perplexing
characters and excessive cursive in Bangla handwriting. Even the best existing
recognizers do not lead to satisfactory performance for practical applications.
To improve the performance of Handwritten Bangla Digit Recognition (HBDR), we
herein present a new approach based on deep neural networks which have recently
shown excellent performance in many pattern recognition and machine learning
applications, but has not been throughly attempted for HBDR. We introduce
Bangla digit recognition techniques based on Deep Belief Network (DBN),
Convolutional Neural Networks (CNN), CNN with dropout, CNN with dropout and
Gaussian filters, and CNN with dropout and Gabor filters. These networks have
the advantage of extracting and using feature information, improving the
recognition of two dimensional shapes with a high degree of invariance to
translation, scaling and other pattern distortions. We systematically evaluated
the performance of our method on publicly available Bangla numeral image
database named CMATERdb 3.1.1. From experiments, we achieved 98.78% recognition
rate using the proposed method: CNN with Gabor features and dropout, which
outperforms the state-of-the-art algorithms for HDBR.Comment: 12 pages, 10 figures, 3 table
Unsupervised Feature Learning for Writer Identification and Writer Retrieval
Deep Convolutional Neural Networks (CNN) have shown great success in
supervised classification tasks such as character classification or dating.
Deep learning methods typically need a lot of annotated training data, which is
not available in many scenarios. In these cases, traditional methods are often
better than or equivalent to deep learning methods. In this paper, we propose a
simple, yet effective, way to learn CNN activation features in an unsupervised
manner. Therefore, we train a deep residual network using surrogate classes.
The surrogate classes are created by clustering the training dataset, where
each cluster index represents one surrogate class. The activations from the
penultimate CNN layer serve as features for subsequent classification tasks. We
evaluate the feature representations on two publicly available datasets. The
focus lies on the ICDAR17 competition dataset on historical document writer
identification (Historical-WI). We show that the activation features trained
without supervision are superior to descriptors of state-of-the-art writer
identification methods. Additionally, we achieve comparable results in the case
of handwriting classification using the ICFHR16 competition dataset on
historical Latin script types (CLaMM16).Comment: ICDAR2017 camera ready (fixed p@2 values, missing table references
Deep learning for word-level handwritten Indic script identification
We propose a novel method that uses convolutional neural networks (CNNs) for
feature extraction. Not just limited to conventional spatial domain
representation, we use multilevel 2D discrete Haar wavelet transform, where
image representations are scaled to a variety of different sizes. These are
then used to train different CNNs to select features. To be precise, we use 10
different CNNs that select a set of 10240 features, i.e. 1024/CNN. With this,
11 different handwritten scripts are identified, where 1K words per script are
used. In our test, we have achieved the maximum script identification rate of
94.73% using multi-layer perceptron (MLP). Our results outperform the
state-of-the-art techniques.Comment: 11 pages, 6 figures , 2 table
PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents
In recent years, deep convolutional neural networks have achieved state of
the art performance in various computer vision task such as classification,
detection or segmentation. Due to their outstanding performance, CNNs are more
and more used in the field of document image analysis as well. In this work, we
present a CNN architecture that is trained with the recently proposed PHOC
representation. We show empirically that our CNN architecture is able to
outperform state of the art results for various word spotting benchmarks while
exhibiting short training and test times.Comment: published as conference paper at the International Conference on
Frontiers in Handwriting Recognition 201
Improving patch-based scene text script identification with ensembles of conjoined networks
This paper focuses on the problem of script identification in scene text
images. Facing this problem with state of the art CNN classifiers is not
straightforward, as they fail to address a key characteristic of scene text
instances: their extremely variable aspect ratio. Instead of resizing input
images to a fixed aspect ratio as in the typical use of holistic CNN
classifiers, we propose here a patch-based classification framework in order to
preserve discriminative parts of the image that are characteristic of its
class. We describe a novel method based on the use of ensembles of conjoined
networks to jointly learn discriminative stroke-parts representations and their
relative importance in a patch-based classification scheme. Our experiments
with this learning procedure demonstrate state-of-the-art results in two public
script identification datasets. In addition, we propose a new public benchmark
dataset for the evaluation of multi-lingual scene text end-to-end reading
systems. Experiments done in this dataset demonstrate the key role of script
identification in a complete end-to-end system that combines our script
identification method with a previously published text detector and an
off-the-shelf OCR engine
- …