706 research outputs found
Generating Handwritten Chinese Characters using CycleGAN
Handwriting of Chinese has long been an important skill in East Asia.
However, automatic generation of handwritten Chinese characters poses a great
challenge due to the large number of characters. Various machine learning
techniques have been used to recognize Chinese characters, but few works have
studied the handwritten Chinese character generation problem, especially with
unpaired training data. In this work, we formulate the Chinese handwritten
character generation as a problem that learns a mapping from an existing
printed font to a personalized handwritten style. We further propose DenseNet
CycleGAN to generate Chinese handwritten characters. Our method is applied not
only to commonly used Chinese characters but also to calligraphy work with
aesthetic values. Furthermore, we propose content accuracy and style
discrepancy as the evaluation metrics to assess the quality of the handwritten
characters generated. We then use our proposed metrics to evaluate the
generated characters from CASIA dataset as well as our newly introduced Lanting
calligraphy dataset.Comment: Accepted at WACV 201
Arabic cursive text recognition from natural scene images
© 2019 by the authors. This paper presents a comprehensive survey on Arabic cursive scene text recognition. The recent years' publications in this field have witnessed the interest shift of document image analysis researchers from recognition of optical characters to recognition of characters appearing in natural images. Scene text recognition is a challenging problem due to the text having variations in font styles, size, alignment, orientation, reflection, illumination change, blurriness and complex background. Among cursive scripts, Arabic scene text recognition is contemplated as a more challenging problem due to joined writing, same character variations, a large number of ligatures, the number of baselines, etc. Surveys on the Latin and Chinese script-based scene text recognition system can be found, but the Arabic like scene text recognition problem is yet to be addressed in detail. In this manuscript, a description is provided to highlight some of the latest techniques presented for text classification. The presented techniques following a deep learning architecture are equally suitable for the development of Arabic cursive scene text recognition systems. The issues pertaining to text localization and feature extraction are also presented. Moreover, this article emphasizes the importance of having benchmark cursive scene text dataset. Based on the discussion, future directions are outlined, some of which may provide insight about cursive scene text to researchers
Stroke Extraction of Chinese Character Based on Deep Structure Deformable Image Registration
Stroke extraction of Chinese characters plays an important role in the field
of character recognition and generation. The most existing character stroke
extraction methods focus on image morphological features. These methods usually
lead to errors of cross strokes extraction and stroke matching due to rarely
using stroke semantics and prior information. In this paper, we propose a deep
learning-based character stroke extraction method that takes semantic features
and prior information of strokes into consideration. This method consists of
three parts: image registration-based stroke registration that establishes the
rough registration of the reference strokes and the target as prior
information; image semantic segmentation-based stroke segmentation that
preliminarily separates target strokes into seven categories; and
high-precision extraction of single strokes. In the stroke registration, we
propose a structure deformable image registration network to achieve
structure-deformable transformation while maintaining the stable morphology of
single strokes for character images with complex structures. In order to verify
the effectiveness of the method, we construct two datasets respectively for
calligraphy characters and regular handwriting characters. The experimental
results show that our method strongly outperforms the baselines. Code is
available at https://github.com/MengLi-l1/StrokeExtraction.Comment: 10 pages, 8 figures, published to AAAI-23 (oral
Advances in Character Recognition
This book presents advances in character recognition, and it consists of 12 chapters that cover wide range of topics on different aspects of character recognition. Hopefully, this book will serve as a reference source for academic research, for professionals working in the character recognition field and for all interested in the subject
Multi-Character Field Recognition for Arabic and Chinese Handwriting
Two methods, Symbolic Indirect Correlation (SIC) and Style Constrained Classification (SCC), are proposed for recognizing handwritten Arabic and Chinese words and phrases. SIC reassembles variable-length segments of an unknown query that match similar segments of labeled reference words. Recognition is based on the correspondence between the order of the feature vectors and of the lexical transcript in both the query and the references. SIC implicitly incorporates language context in the form of letter n-grams. SCC is based on the notion that the style (distortion or noise) of a character is a good predictor of the distortions arising in other characters, even of a different class, from the same source. It is adaptive in the sense that with a long-enough field, its accuracy converges to that of a style-specific classifier trained on the writer of the unknown query. Neither SIC nor SCC requires the query words to appear among the references
Multi-Character Field Recognition for Arabic and Chinese Handwriting
Two methods, Symbolic Indirect Correlation (SIC) and Style Constrained Classification (SCC), are proposed for recognizing handwritten Arabic and Chinese words and phrases. SIC reassembles variable-length segments of an unknown query that match similar segments of labeled reference words. Recognition is based on the correspondence between the order of the feature vectors and of the lexical transcript in both the query and the references. SIC implicitly incorporates language context in the form of letter n-grams. SCC is based on the notion that the style (distortion or noise) of a character is a good predictor of the distortions arising in other characters, even of a different class, from the same source. It is adaptive in the sense that with a long-enough field, its accuracy converges to that of a style-specific classifier trained on the writer of the unknown query. Neither SIC nor SCC requires the query words to appear among the references
SCML: A Structural Representation for Chinese Characters
Chinese characters are used daily by well over a billion people. They constitute the main writing system of China and Taiwan, form a major part of written Japanese, and are also used in South Korea. Anything more than a cursory glance at these characters will reveal a high degree of structure to them, but computing systems do not currently have a means to operate on this structure. Existing character databases and dictionaries treat them as numerical code points, and associate with them additional `hand-computed\u27 data, such as stroke count, stroke order, and other information to aid in specific searches. Searching by a character\u27s `shape\u27 is effectively impossible in these systems. I propose a new approach to representing these characters, through an XML-based language called SCML. This language, by encoding an abstract form of a character, allows the direct retrieval of important information such as stroke count and stroke order, and permits useful but previously impossible automated analysis of characters. In addition, the system allows the design of a view that takes abstract SCML representations as character models and outputs glyphs based on an aesthetic, facilitating the creation of `meta-fonts\u27 for Chinese characters. Finally, through the creation of a specialized database, SCML allows for efficient structural character queries to be performed against the body of inserted characters, thus allowing people to search by the most obvious of a character\u27s characteristics: its shape
Recognition of Japanese handwritten characters with Machine learning techniques
The recognition of Japanese handwritten characters has always been a challenge for researchers. A large number of classes, their graphic complexity, and the existence of three different writing systems make this problem particularly difficult compared to Western writing. For decades, attempts have been made to address the problem using traditional OCR (Optical Character Recognition) techniques, with mixed results. With the recent popularization of machine learning techniques through neural networks, this research has been revitalized, bringing new approaches to the problem. These new results achieve performance levels comparable to human recognition. Furthermore, these new techniques have allowed collaboration with very different disciplines, such as the Humanities or East Asian studies, achieving advances in them that would not have been possible without this interdisciplinary work. In this thesis, these techniques are explored until reaching a sufficient level of understanding that allows us to carry out our own experiments, training neural network models with public datasets of Japanese characters. However, the scarcity of public datasets makes the task of researchers remarkably difficult. Our proposal to minimize this problem is the development of a web application that allows researchers to easily collect samples of Japanese characters through the collaboration of any user. Once the application is fully operational, the examples collected until that point will be used to create a new dataset in a specific format. Finally, we can use the new data to carry out comparative experiments with the previous neural network models
- …