15,137 research outputs found

    Simple Character Recognition

    Get PDF
    Tato práce se zabývá vyhledáním a rozpoznáváním textu v obraze. Rozebírá problematiku extrakce příznaků a jejich použití při strojovém učení. Popisuje postup při návrhu a implementaci jednoduché aplikace pro rozpoznávání znaků strojově psaného textu.This work deals with the process of text location and recognition in an image document. It discusses the matter of feature extraction and its usage in machine learning. Portion of this work is devoted to design and implementation of application for simple character recognition of machine printed text.

    Handwritten Character Recognition of South Indian Scripts: A Review

    Full text link
    Handwritten character recognition is always a frontier area of research in the field of pattern recognition and image processing and there is a large demand for OCR on hand written documents. Even though, sufficient studies have performed in foreign scripts like Chinese, Japanese and Arabic characters, only a very few work can be traced for handwritten character recognition of Indian scripts especially for the South Indian scripts. This paper provides an overview of offline handwritten character recognition in South Indian Scripts, namely Malayalam, Tamil, Kannada and Telungu.Comment: Paper presented on the "National Conference on Indian Language Computing", Kochi, February 19-20, 2011. 6 pages, 5 figure

    A novel off-line character recognition: an MLP approach

    Get PDF
    The purpose of this thesis work is to explore the possibility of efficient man-machine communication through printed documents. An attempt has been made to show the pattern recognition techniques i.e., KNN classifier helpful in recognition of machine printed characters and artificial neural networks may be used to represent and recognize printed English characters of any font and size. In our current work the machine printed document images are scanned by a front end video scanner and are applied to noise removal techniques using smoothing and sharpening filters. The noiseless images are digitized into a bi-level image using Ni-Black proposed binarization technique and proposed adaptive thresholding algorithm using Laplacian sign. Our work is split into three parts. The first part deals with segmentation and thinning. The output of this phase is thinned character image. The second part involves features are extracted from thinned image. The third part deals with KNN classifiers and training of the multilayer perceptron and recognizing characters after the system is trained. Automatic character recognition system promises to hold great future in Automatic office information processing system by integrating with multimedia, like Graphics, image and voice, into a single work station

    Handwritten and printed text separation in historical documents

    Get PDF
    Historical documents present many challenges for Optical Character Recognition Systems (OCR), especially documents of poor quality containing handwritten annotations, stamps, signatures, and historical fonts. As most OCRs recognize either machine-printed or handwritten texts, printed and handwritten parts have to be separated before using the respective recognition system. This thesis addresses the problem of segmentation of handwritings and printings in historical Latin text documents. To alleviate the problem of lack of data containing handwritten and machine-printed components located on the same page or even overlapping each other as well as their pixel-wise annotations, the data synthesis method proposed in [12] was applied and new datasets were generated. The newly created images and their pixel-level labels were used to train Fully Convolutional Model (FCN) introduced in [5]. The newly trained model has shown better results in the separation of machine-printed and handwritten text in historical documents

    Character Queries: A Transformer-based Approach to On-Line Handwritten Character Segmentation

    Full text link
    On-line handwritten character segmentation is often associated with handwriting recognition and even though recognition models include mechanisms to locate relevant positions during the recognition process, it is typically insufficient to produce a precise segmentation. Decoupling the segmentation from the recognition unlocks the potential to further utilize the result of the recognition. We specifically focus on the scenario where the transcription is known beforehand, in which case the character segmentation becomes an assignment problem between sampling points of the stylus trajectory and characters in the text. Inspired by the kk-means clustering algorithm, we view it from the perspective of cluster assignment and present a Transformer-based architecture where each cluster is formed based on a learned character query in the Transformer decoder block. In order to assess the quality of our approach, we create character segmentation ground truths for two popular on-line handwriting datasets, IAM-OnDB and HANDS-VNOnDB, and evaluate multiple methods on them, demonstrating that our approach achieves the overall best results.Comment: ICDAR 2023 Best Student Paper Award. Code available at https://github.com/jungomi/character-querie
    corecore