173 research outputs found
Zone Segmentation and Thinning based Algorithm for Segmentation of Devnagari Text
Character segmentation of handwritten documents is an challenging research topic due to its diverse application environment.OCR can be used for automated processing and handling of forms, old corrupted reports, bank cheques, postal codes and structures. Now Segmentation of a word into characters is one of the major challenge in optical character recognition. This is even more challenging when we segment characters in an offline handwritten document and the next hurdle is presence of broken ,touching and overlapped characters in devnagari script. So, in this paper we have introduced an algorithm that will segment both broken as well as touching characters in devnagari script. Now to segment these characters the algorithm uses both zone segmentation and thinning based techniques. We have used 85 words each for isolated, broken, touching and both broken as well as touching characters individually. Results achieved while segmentation of broken as well as touching are 96.2 % on an average
Learning Spatial-Semantic Context with Fully Convolutional Recurrent Network for Online Handwritten Chinese Text Recognition
Online handwritten Chinese text recognition (OHCTR) is a challenging problem
as it involves a large-scale character set, ambiguous segmentation, and
variable-length input sequences. In this paper, we exploit the outstanding
capability of path signature to translate online pen-tip trajectories into
informative signature feature maps using a sliding window-based method,
successfully capturing the analytic and geometric properties of pen strokes
with strong local invariance and robustness. A multi-spatial-context fully
convolutional recurrent network (MCFCRN) is proposed to exploit the multiple
spatial contexts from the signature feature maps and generate a prediction
sequence while completely avoiding the difficult segmentation problem.
Furthermore, an implicit language model is developed to make predictions based
on semantic context within a predicting feature sequence, providing a new
perspective for incorporating lexicon constraints and prior knowledge about a
certain language in the recognition procedure. Experiments on two standard
benchmarks, Dataset-CASIA and Dataset-ICDAR, yielded outstanding results, with
correct rates of 97.10% and 97.15%, respectively, which are significantly
better than the best result reported thus far in the literature.Comment: 14 pages, 9 figure
シンソウニューラルネットワークニヨルテガキテキストニンシキ
博士(工学)東京農工大
MatriVasha: A Multipurpose Comprehensive Database for Bangla Handwritten Compound Characters
At present, recognition of the Bangla handwriting compound character has been
an essential issue for many years. In recent years there have been
application-based researches in machine learning, and deep learning, which is
gained interest, and most notably is handwriting recognition because it has a
tremendous application such as Bangla OCR. MatrriVasha, the project which can
recognize Bangla, handwritten several compound characters. Currently, compound
character recognition is an important topic due to its variant application, and
helps to create old forms, and information digitization with reliability. But
unfortunately, there is a lack of a comprehensive dataset that can categorize
all types of Bangla compound characters. MatrriVasha is an attempt to align
compound character, and it's challenging because each person has a unique style
of writing shapes. After all, MatrriVasha has proposed a dataset that intends
to recognize Bangla 120(one hundred twenty) compound characters that consist of
2552(two thousand five hundred fifty-two) isolated handwritten characters
written unique writers which were collected from within Bangladesh. This
dataset faced problems in terms of the district, age, and gender-based written
related research because the samples were collected that includes a verity of
the district, age group, and the equal number of males, and females. As of now,
our proposed dataset is so far the most extensive dataset for Bangla compound
characters. It is intended to frame the acknowledgment technique for
handwritten Bangla compound character. In the future, this dataset will be made
publicly available to help to widen the research.Comment: 19 fig, 2 tabl
Character Recognition
Character recognition is one of the pattern recognition technologies that are most widely used in practical applications. This book presents recent advances that are relevant to character recognition, from technical topics such as image processing, feature extraction or classification, to new applications including human-computer interfaces. The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field
Advances in Character Recognition
This book presents advances in character recognition, and it consists of 12 chapters that cover wide range of topics on different aspects of character recognition. Hopefully, this book will serve as a reference source for academic research, for professionals working in the character recognition field and for all interested in the subject
SEGMENTATION OF TOUCHING CHARACTER PRINTED LANNA SCRIPT USING JUNCTION POINT
In the northern part of Thailand since 1802, Lanna characters were popular as ancient characters. The segmentation of printed documents in Lanna characters is a challenging problem, such as the partial overlapping of characters and touching characters. This paper focuses on only the touching characters such as touching between consonants and vowels. Segmentation method begins with the horizontal histogram and then vertical histogram for segmentation of text lines and characters, respectively. The results are characters consisted of correct clear characters, partial overlapping characters, and touching characters. The proposed method computes the left edge junction points and right edge junction points. Then find their maximum numbers and find the value of its row to separate consonant and vowel from touching. The trial over the text documents printed in Lanna characters can be processed with an accuracy of 95.81%
- …