52 research outputs found
Investigation into using the unicode standard for primitives of unified han characters
The Unicode standard identifies and provides representation of the vast majority of known characters used in today’s writing systems. Many of these characters belong to the unified Han series, which encapsulates characters from writing systems used in languages such as Chinese, Japanese and Korean languages. These pictographic characters are often made up of smaller primitives, either other characters or more simplified pictography. This paper presents research findings of how the Unicode standard currently represents the primitives used in 4134 of the most common Han characters.
Recognition of handwritten Chinese characters by combining regularization, Fisher's discriminant and distorted sample generation
Proceedings of the 10th International Conference on Document Analysis and Recognition, 2009, p. 1026–1030The problem of offline handwritten Chinese character recognition has been extensively studied by many researchers and very high recognition rates have been reported. In this paper, we propose to further boost the recognition rate by incorporating a distortion model that artificially generates a huge number of virtual training samples from existing ones. We achieve a record high recognition rate of 99.46% on the ETL-9B database. Traditionally, when the dimension of the feature vector is high and the number of training samples is not sufficient, the remedies are to (i) regularize the class covariance matrices in the discriminant functions, (ii) employ Fisher's dimension reduction technique to reduce the feature dimension, and (iii) generate a huge number of virtual training samples from existing ones. The second contribution of this paper is the investigation of the relative effectiveness of these three methods for boosting the recognition rate. © 2009 IEEE.published_or_final_versio
Recommended from our members
A Syntactic Omni-Font Character Recognition System
The author introduces a syntactic omni-font character recognition system that recognizes a wide range of fonts, including handprinted characters. A structural pattern-matching approach is used. Essentially, a set of loosely constrained rules specify pattern components and their interrelationships. The robustness of the system is derived from the orthogonal set of pattern descriptors, location functions, and the manner in which they are combined to exploit the topological structure of characters. By virtue of the new pattern description language, PDL, the user may easily write rules to define new patterns for the system to recognize. The system also features scale-invariance and user-definable sensitivity to tilt orientation. The system has achieved a 95. 2% recognition rate
Online Japanese Character Recognition Using Trajectory-Based Normalization and Direction Feature Extraction
http://www.suvisoft.comThis paper describes an online Japanese character recognition system using advanced techniques of pattern normalization and direction feature extraction. The normalization of point coordinates and the decomposition of direction elements are directly performed on online trajectory, and therefore, are computationally efficient. We compare one-dimensional and pseudo two-dimensional (pseudo 2D) normalization methods, as well as direction features from original pattern and from normalized pattern. In experiments on the TUAT HANDS databases, the pseudo 2D normalization methods yielded superior performance, while direction features from original pattern and from normalized pattern made little difference
- …