189,234 research outputs found
On-line Chinese character recognition.
by Jian-Zhuang Liu.Thesis (Ph.D.)--Chinese University of Hong Kong, 1997.Includes bibliographical references (p. 183-196).Microfiche. Ann Arbor, Mich.: UMI, 1998. 3 microfiches ; 11 x 15 cm
Special Radical Detection by Statistical Classification for On-line Handwritten Chinese Character Recognition
International audienceThe hierarchical nature of Chinese characters has inspired radical-based recognition, but radical segmentation from characters remains a challenge. We previously proposed a radical-based approach for on-line handwritten Chinese character recognition, which incorporates character structure knowledge into integrated radical segmentation and recognition, and performs well on characters of left-right and up-down structures (non-special structures). In this paper, we propose a statistical-classification-based method for detecting special radicals from special-structure characters. We design 19 binary classifiers for classifying candidate radicals (groups of strokes) hypothesized from the input character. Characters with special radicals detected are recognized using special-structure models, while those without special radicals are recognized using the models for non-special structures. We applied the recognition framework to 6,763 character classes, and achieved promising recognition performance in experiments
An off-line large vocabulary hand-written Chinese character recognizer
An off-line hand-written Chinese character recognizer based on contextual vector quantization (CVQ) supporting a vocabulary of 4616 Chinese characters, alphanumerics and punctuation symbols has been reported. Trained with a sample for each character from each of 100 writers and tested on texts of 160000 characters written by another 200 writers, the average recognition rate is 77.2%. Two statistical language models have been investigated in this study. Their performance in terms of their capabilities in upgrading the recognition rate by 8.8% and 12.0% respectively when used as post-processors of the recognizer are reported.published_or_final_versio
Chinese Text Recognition with A Pre-Trained CLIP-Like Model Through Image-IDS Aligning
Scene text recognition has been studied for decades due to its broad
applications. However, despite Chinese characters possessing different
characteristics from Latin characters, such as complex inner structures and
large categories, few methods have been proposed for Chinese Text Recognition
(CTR). Particularly, the characteristic of large categories poses challenges in
dealing with zero-shot and few-shot Chinese characters. In this paper, inspired
by the way humans recognize Chinese texts, we propose a two-stage framework for
CTR. Firstly, we pre-train a CLIP-like model through aligning printed character
images and Ideographic Description Sequences (IDS). This pre-training stage
simulates humans recognizing Chinese characters and obtains the canonical
representation of each character. Subsequently, the learned representations are
employed to supervise the CTR model, such that traditional single-character
recognition can be improved to text-line recognition through image-IDS
matching. To evaluate the effectiveness of the proposed method, we conduct
extensive experiments on both Chinese character recognition (CCR) and CTR. The
experimental results demonstrate that the proposed method performs best in CCR
and outperforms previous methods in most scenarios of the CTR benchmark. It is
worth noting that the proposed method can recognize zero-shot Chinese
characters in text images without fine-tuning, whereas previous methods require
fine-tuning when new classes appear. The code is available at
https://github.com/FudanVI/FudanOCR/tree/main/image-ids-CTR.Comment: ICCV 202
Open Set Chinese Character Recognition using Multi-typed Attributes
Recognition of Off-line Chinese characters is still a challenging problem,
especially in historical documents, not only in the number of classes extremely
large in comparison to contemporary image retrieval methods, but also new
unseen classes can be expected under open learning conditions (even for CNN).
Chinese character recognition with zero or a few training samples is a
difficult problem and has not been studied yet. In this paper, we propose a new
Chinese character recognition method by multi-type attributes, which are based
on pronunciation, structure and radicals of Chinese characters, applied to
character recognition in historical books. This intermediate attribute code has
a strong advantage over the common `one-hot' class representation because it
allows for understanding complex and unseen patterns symbolically using
attributes. First, each character is represented by four groups of attribute
types to cover a wide range of character possibilities: Pinyin label, layout
structure, number of strokes, three different input methods such as Cangjie,
Zhengma and Wubi, as well as a four-corner encoding method. A convolutional
neural network (CNN) is trained to learn these attributes. Subsequently,
characters can be easily recognized by these attributes using a distance metric
and a complete lexicon that is encoded in attribute space. We evaluate the
proposed method on two open data sets: printed Chinese character recognition
for zero-shot learning, historical characters for few-shot learning and a
closed set: handwritten Chinese characters. Experimental results show a good
general classification of seen classes but also a very promising generalization
ability to unseen characters.Comment: 29 pages, submitted to Pattern Recognitio
Off-line hand-printed chinese character recognition based on stroke matching
The specific purpose of this thesis is the automated recognition of the off-line Chinese hand-printed characters by using a blue ball-point pen. Through mask processing, the main components in a Chinese character such as vertical, horizontal, and slant strokes can be extracted. Then, the connected components with the coordinates of the top, bottom, leftmost, and rightmost ends of each stroke extracted are found. From these coordinates, the length and position of each stroke can be computed.
According to the number, relative length, and relative position of each stroke, both of the coarse and fine rule-based classification can be made, and the goal of this thesis is able to be reached.
Excluding the load and segmentation of the original image, the computing time for the feature extraction and classification depends on the image size and the number of strokes. It is about 0.3 seconds per Chinese character on an IBM PC 80486 DX33.
The advantages of the proposed method include efficient time complexity, strong ability to detect very similar Chinese characters, tolerance of the slope of the stroke, and 96% or higher recognition rate.
The disadvantage is the inflexibility for learning driven by the users since the matching rules are open to the manufactures only at present
A scheme of on-line Chinese character recognition using neural networks
[[abstract]]The paper proposes a scheme of online Chinese character recognition, based on neural networks. The supervised backpropagation algorithm is used to train the network. The input character is converted as a sequence of virtual stroke segments as well as real stroke segments, which is a good feature exactly describing the complete structure of a character, and is to be extracted by our system. In order to simplify the recognition process and reduce the recognition time, the neural network is divided into several subnetworks. Each of them is responsible for recognizing a group of about 75 character patterns. In other words, the huge set of Chinese characters is divided into several groups according to the numbers of stroke segments in the characters, and for each group of characters, a specific subnetwork is trained in order to recognize every character in the group. Whenever the system accepts an input Chinese character, it will calculate the number of stroke segments, including virtual stroke segments as well as real stroke segments in that character, and then determine which subnets to enter for recognition process. The system is allowed to accept and recognize some interconnected characters. The algorithm was experimentally implemented in a personal computer system, it accepts interconnected Chinese characters written on an electronic tablet, and performs recognition in real time. Our experiment showed that recognition accuracy exceeded 96% on the test example.[[conferencetype]]國際[[conferencedate]]19971012~19971015[[booktype]]紙本[[conferencelocation]]Orlando, FL, US
- …