15,137 research outputs found
Simple Character Recognition
Tato práce se zabývá vyhledáním a rozpoznáváním textu v obraze. Rozebírá problematiku extrakce příznaků a jejich použití při strojovém učení. Popisuje postup při návrhu a implementaci jednoduché aplikace pro rozpoznávání znaků strojově psaného textu.This work deals with the process of text location and recognition in an image document. It discusses the matter of feature extraction and its usage in machine learning. Portion of this work is devoted to design and implementation of application for simple character recognition of machine printed text.
Handwritten Character Recognition of South Indian Scripts: A Review
Handwritten character recognition is always a frontier area of research in
the field of pattern recognition and image processing and there is a large
demand for OCR on hand written documents. Even though, sufficient studies have
performed in foreign scripts like Chinese, Japanese and Arabic characters, only
a very few work can be traced for handwritten character recognition of Indian
scripts especially for the South Indian scripts. This paper provides an
overview of offline handwritten character recognition in South Indian Scripts,
namely Malayalam, Tamil, Kannada and Telungu.Comment: Paper presented on the "National Conference on Indian Language
Computing", Kochi, February 19-20, 2011. 6 pages, 5 figure
A novel off-line character recognition: an MLP approach
The purpose of this thesis work is to explore the possibility of efficient man-machine communication through printed documents. An attempt has been made to show the pattern recognition techniques i.e., KNN classifier helpful in recognition of machine printed characters and artificial neural networks may be used to represent and recognize printed English characters of any font and size. In our current work the machine printed document images are scanned by a front end video scanner and are applied to noise removal techniques using smoothing and sharpening filters. The noiseless images are digitized into a bi-level image using Ni-Black proposed binarization technique and proposed adaptive thresholding algorithm using Laplacian sign. Our work is split into three parts. The first part deals with segmentation and thinning. The output of this phase is thinned character image. The second part involves features are extracted from thinned image. The third part deals with KNN classifiers and training of the multilayer perceptron and recognizing characters after the system is trained. Automatic character recognition system promises to hold great future in Automatic office information processing system by integrating with multimedia, like Graphics, image and voice, into a single work station
Handwritten and printed text separation in historical documents
Historical documents present many challenges for Optical Character Recognition Systems
(OCR), especially documents of poor quality containing handwritten annotations,
stamps, signatures, and historical fonts. As most OCRs recognize either machine-printed
or handwritten texts, printed and handwritten parts have to be separated before using
the respective recognition system. This thesis addresses the problem of segmentation of
handwritings and printings in historical Latin text documents. To alleviate the problem
of lack of data containing handwritten and machine-printed components located on the
same page or even overlapping each other as well as their pixel-wise annotations, the data
synthesis method proposed in [12] was applied and new datasets were generated. The
newly created images and their pixel-level labels were used to train Fully Convolutional
Model (FCN) introduced in [5]. The newly trained model has shown better results in the
separation of machine-printed and handwritten text in historical documents
Character Queries: A Transformer-based Approach to On-Line Handwritten Character Segmentation
On-line handwritten character segmentation is often associated with
handwriting recognition and even though recognition models include mechanisms
to locate relevant positions during the recognition process, it is typically
insufficient to produce a precise segmentation. Decoupling the segmentation
from the recognition unlocks the potential to further utilize the result of the
recognition. We specifically focus on the scenario where the transcription is
known beforehand, in which case the character segmentation becomes an
assignment problem between sampling points of the stylus trajectory and
characters in the text. Inspired by the -means clustering algorithm, we view
it from the perspective of cluster assignment and present a Transformer-based
architecture where each cluster is formed based on a learned character query in
the Transformer decoder block. In order to assess the quality of our approach,
we create character segmentation ground truths for two popular on-line
handwriting datasets, IAM-OnDB and HANDS-VNOnDB, and evaluate multiple methods
on them, demonstrating that our approach achieves the overall best results.Comment: ICDAR 2023 Best Student Paper Award. Code available at
https://github.com/jungomi/character-querie
- …