143,952 research outputs found

    Rank-frequency relation for Chinese characters

    Full text link
    We show that the Zipf's law for Chinese characters perfectly holds for sufficiently short texts (few thousand different characters). The scenario of its validity is similar to the Zipf's law for words in short English texts. For long Chinese texts (or for mixtures of short Chinese texts), rank-frequency relations for Chinese characters display a two-layer, hierarchic structure that combines a Zipfian power-law regime for frequent characters (first layer) with an exponential-like regime for less frequent characters (second layer). For these two layers we provide different (though related) theoretical descriptions that include the range of low-frequency characters (hapax legomena). The comparative analysis of rank-frequency relations for Chinese characters versus English words illustrates the extent to which the characters play for Chinese writers the same role as the words for those writing within alphabetical systems.Comment: To appear in European Physical Journal B (EPJ B), 2014 (22 pages, 7 figures

    New Perspectives in Sinographic Language Processing Through the Use of Character Structure

    Full text link
    Chinese characters have a complex and hierarchical graphical structure carrying both semantic and phonetic information. We use this structure to enhance the text model and obtain better results in standard NLP operations. First of all, to tackle the problem of graphical variation we define allographic classes of characters. Next, the relation of inclusion of a subcharacter in a characters, provides us with a directed graph of allographic classes. We provide this graph with two weights: semanticity (semantic relation between subcharacter and character) and phoneticity (phonetic relation) and calculate "most semantic subcharacter paths" for each character. Finally, adding the information contained in these paths to unigrams we claim to increase the efficiency of text mining methods. We evaluate our method on a text classification task on two corpora (Chinese and Japanese) of a total of 18 million characters and get an improvement of 3% on an already high baseline of 89.6% precision, obtained by a linear SVM classifier. Other possible applications and perspectives of the system are discussed.Comment: 17 pages, 5 figures, presented at CICLing 201

    Ghanaian Chinese Language Learners’ Perception of Chinese Characters

    Get PDF
    This paper investigated students’ perception of learning Chinese characters at the University of Ghana. The Chinese writing system is an exclusive indispensable script that forms part of the Chinese culture. However, the complexity, forms, strokes, pronunciation, radicals, and orthography structure of the characters makes it difficult for Ghanaian students to learn the Chinese language. A qualitative and quantitative design was used for the study. Of 338 students, 183 participated in the study from the first to the fourth year. Purposive sampling was used to select the students to respond to the questionnaire and share their opinions about the Chinese characters in interviews. The findings showed that (a) reading and writing of the Chinese characters were perceived to be more difficult than speaking. (b) the Chinese character radicals, forms, remembering of strokes, orders, numbers, and the orthography structure of the Chinese characters were a hurdle for Chinese language learners. Suggestions were made to urge students to cultivate the habit of consistently practicing the characters through collective participation and learning. The language learners need to do away with excuses, fear, and make-believe obstructions and spend more time in the learning process to enhance their skills in the Chinese writing system

    Chinese Font Style Transfer with Neural Network

    Get PDF
    Font design is an important area in digital art. However, designers have to design character one by one manually. At the same time, Chinese contains more than 20,000 characters. Chinese offical dataset GB 18030-2000 has 27,533 characters. ZhongHuaZiHai, an official Chinese dictionary, contains 85,568 characters. And JinXiWenZiJing, an dataset published by AINet company, includes about 160,000 chinese characters. Thus Chinese font design is a hard task. In the paper, we introduce a method to help designers finish the process faster. With the method, designers only need to design a small set of Chinese characters. Other characters will be generated automatically. Deep neural network develops fast these years and is very powerful. We tried many kinds of deep neural network with different structure and finally use the one we introduce here. The generated characters have similar style as the ones designed by designer as shown in experiment part

    Karakter Han dengan Radikal 示 dalam Shuowenjiezi: Klasifikasi, Aktivitas Penyembahan, Perbandingan dengan Kamus Xiandai Hanyu

    Full text link
    The writing system used in the Chinese language is different from the Latin characters used in the Indonesian language. While the Latin characters represent sounds, the Han characters of the Chinese language represent meanings. Some Han characters have a component called a ‘radical\u27. This paper discusses Han characters with the 示 radical in Shuowen jiezi. The first part consists of the classification of such characters based on their structure and meaning. The second part discusses the worship practices on which the characters are based. The last part of this paper compares these Han characters to those with the same 示 radical in the Xiandai Hanyu Dictionary
    • …