189,234 research outputs found

    On-line Chinese character recognition.

    Get PDF
    by Jian-Zhuang Liu.Thesis (Ph.D.)--Chinese University of Hong Kong, 1997.Includes bibliographical references (p. 183-196).Microfiche. Ann Arbor, Mich.: UMI, 1998. 3 microfiches ; 11 x 15 cm

    Special Radical Detection by Statistical Classification for On-line Handwritten Chinese Character Recognition

    No full text
    International audienceThe hierarchical nature of Chinese characters has inspired radical-based recognition, but radical segmentation from characters remains a challenge. We previously proposed a radical-based approach for on-line handwritten Chinese character recognition, which incorporates character structure knowledge into integrated radical segmentation and recognition, and performs well on characters of left-right and up-down structures (non-special structures). In this paper, we propose a statistical-classification-based method for detecting special radicals from special-structure characters. We design 19 binary classifiers for classifying candidate radicals (groups of strokes) hypothesized from the input character. Characters with special radicals detected are recognized using special-structure models, while those without special radicals are recognized using the models for non-special structures. We applied the recognition framework to 6,763 character classes, and achieved promising recognition performance in experiments

    An off-line large vocabulary hand-written Chinese character recognizer

    Get PDF
    An off-line hand-written Chinese character recognizer based on contextual vector quantization (CVQ) supporting a vocabulary of 4616 Chinese characters, alphanumerics and punctuation symbols has been reported. Trained with a sample for each character from each of 100 writers and tested on texts of 160000 characters written by another 200 writers, the average recognition rate is 77.2%. Two statistical language models have been investigated in this study. Their performance in terms of their capabilities in upgrading the recognition rate by 8.8% and 12.0% respectively when used as post-processors of the recognizer are reported.published_or_final_versio

    Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning

    Full text link
    Scene text recognition has been studied for decades due to its broad applications. However, despite Chinese characters possessing different characteristics from Latin characters, such as complex inner structures and large categories, few methods have been proposed for Chinese Text Recognition (CTR). Particularly, the characteristic of large categories poses challenges in dealing with zero-shot and few-shot Chinese characters. In this paper, inspired by the way humans recognize Chinese texts, we propose a two-stage framework for CTR. Firstly, we pre-train a CLIP-like model through aligning printed character images and Ideographic Description Sequences (IDS). This pre-training stage simulates humans recognizing Chinese characters and obtains the canonical representation of each character. Subsequently, the learned representations are employed to supervise the CTR model, such that traditional single-character recognition can be improved to text-line recognition through image-IDS matching. To evaluate the effectiveness of the proposed method, we conduct extensive experiments on both Chinese character recognition (CCR) and CTR. The experimental results demonstrate that the proposed method performs best in CCR and outperforms previous methods in most scenarios of the CTR benchmark. It is worth noting that the proposed method can recognize zero-shot Chinese characters in text images without fine-tuning, whereas previous methods require fine-tuning when new classes appear. The code is available at https://github.com/FudanVI/FudanOCR/tree/main/image-ids-CTR.Comment: ICCV 202

    Open Set Chinese Character Recognition using Multi-typed Attributes

    Get PDF
    Recognition of Off-line Chinese characters is still a challenging problem, especially in historical documents, not only in the number of classes extremely large in comparison to contemporary image retrieval methods, but also new unseen classes can be expected under open learning conditions (even for CNN). Chinese character recognition with zero or a few training samples is a difficult problem and has not been studied yet. In this paper, we propose a new Chinese character recognition method by multi-type attributes, which are based on pronunciation, structure and radicals of Chinese characters, applied to character recognition in historical books. This intermediate attribute code has a strong advantage over the common `one-hot' class representation because it allows for understanding complex and unseen patterns symbolically using attributes. First, each character is represented by four groups of attribute types to cover a wide range of character possibilities: Pinyin label, layout structure, number of strokes, three different input methods such as Cangjie, Zhengma and Wubi, as well as a four-corner encoding method. A convolutional neural network (CNN) is trained to learn these attributes. Subsequently, characters can be easily recognized by these attributes using a distance metric and a complete lexicon that is encoded in attribute space. We evaluate the proposed method on two open data sets: printed Chinese character recognition for zero-shot learning, historical characters for few-shot learning and a closed set: handwritten Chinese characters. Experimental results show a good general classification of seen classes but also a very promising generalization ability to unseen characters.Comment: 29 pages, submitted to Pattern Recognitio

    Off-line hand-printed chinese character recognition based on stroke matching

    Get PDF
    The specific purpose of this thesis is the automated recognition of the off-line Chinese hand-printed characters by using a blue ball-point pen. Through mask processing, the main components in a Chinese character such as vertical, horizontal, and slant strokes can be extracted. Then, the connected components with the coordinates of the top, bottom, leftmost, and rightmost ends of each stroke extracted are found. From these coordinates, the length and position of each stroke can be computed. According to the number, relative length, and relative position of each stroke, both of the coarse and fine rule-based classification can be made, and the goal of this thesis is able to be reached. Excluding the load and segmentation of the original image, the computing time for the feature extraction and classification depends on the image size and the number of strokes. It is about 0.3 seconds per Chinese character on an IBM PC 80486 DX33. The advantages of the proposed method include efficient time complexity, strong ability to detect very similar Chinese characters, tolerance of the slope of the stroke, and 96% or higher recognition rate. The disadvantage is the inflexibility for learning driven by the users since the matching rules are open to the manufactures only at present

    A scheme of on-line Chinese character recognition using neural networks

    Get PDF
    [[abstract]]The paper proposes a scheme of online Chinese character recognition, based on neural networks. The supervised backpropagation algorithm is used to train the network. The input character is converted as a sequence of virtual stroke segments as well as real stroke segments, which is a good feature exactly describing the complete structure of a character, and is to be extracted by our system. In order to simplify the recognition process and reduce the recognition time, the neural network is divided into several subnetworks. Each of them is responsible for recognizing a group of about 75 character patterns. In other words, the huge set of Chinese characters is divided into several groups according to the numbers of stroke segments in the characters, and for each group of characters, a specific subnetwork is trained in order to recognize every character in the group. Whenever the system accepts an input Chinese character, it will calculate the number of stroke segments, including virtual stroke segments as well as real stroke segments in that character, and then determine which subnets to enter for recognition process. The system is allowed to accept and recognize some interconnected characters. The algorithm was experimentally implemented in a personal computer system, it accepts interconnected Chinese characters written on an electronic tablet, and performs recognition in real time. Our experiment showed that recognition accuracy exceeded 96% on the test example.[[conferencetype]]國際[[conferencedate]]19971012~19971015[[booktype]]紙本[[conferencelocation]]Orlando, FL, US
    • …
    corecore