Search CORE

23,567 research outputs found

Optimizing the Learning Order of Chinese Characters Using a Novel Topological Sort Algorithm

Author: Loach James C.
Wang Jinzhao
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 24/09/2016
Field of study

We present a novel algorithm for optimizing the order in which Chinese characters are learned, one that incorporates the benefits of learning them in order of usage frequency and in order of their hierarchal structural relationships. We show that our work outperforms previously published orders and algorithms. Our algorithm is applicable to any scheduling task where nodes have intrinsic differences in importance and must be visited in topological order

arXiv.org e-Print Archive

Public Library of Science (PLOS)

FigShare

New Perspectives in Sinographic Language Processing Through the Use of Character Structure

Author: A. Guder-Manitius
C. Schindelin
C. Williams
J. Li
J. Rocha
K. Tamaoka
M. Taft
M.J. Dürst
N. Chikamatsu
R. Dai
S. Yu
T. Bishop
T. Morioka
Y. Fujiwara
Y.-M. Chou
Publication venue
Publication date: 01/01/2013
Field of study

Chinese characters have a complex and hierarchical graphical structure carrying both semantic and phonetic information. We use this structure to enhance the text model and obtain better results in standard NLP operations. First of all, to tackle the problem of graphical variation we define allographic classes of characters. Next, the relation of inclusion of a subcharacter in a characters, provides us with a directed graph of allographic classes. We provide this graph with two weights: semanticity (semantic relation between subcharacter and character) and phoneticity (phonetic relation) and calculate "most semantic subcharacter paths" for each character. Finally, adding the information contained in these paths to unigrams we claim to increase the efficiency of text mining methods. We evaluate our method on a text classification task on two corpora (Chinese and Japanese) of a total of 18 million characters and get an improvement of 3% on an already high baseline of 89.6% precision, obtained by a linear SVM classifier. Other possible applications and perspectives of the system are discussed.Comment: 17 pages, 5 figures, presented at CICLing 201

arXiv.org e-Print Archive

Crossref

HAL-Université de Bretagne Occidentale

HAL Descartes

Hal-Diderot

Brain-inspired conscious computing architecture

Author: Duch Prof Wlodzislaw
Publication venue
Publication date: 01/01/2003
Field of study

What type of artificial systems will claim to be conscious and will claim to experience qualia? The ability to comment upon physical states of a brain-like dynamical system coupled with its environment seems to be sufficient to make claims. The flow of internal states in such system, guided and limited by associative memory, is similar to the stream of consciousness. Minimal requirements for an artificial system that will claim to be conscious were given in form of specific architecture named articon. Nonverbal discrimination of the working memory states of the articon gives it the ability to experience different qualities of internal states. Analysis of the inner state flows of such a system during typical behavioral process shows that qualia are inseparable from perception and action. The role of consciousness in learning of skills, when conscious information processing is replaced by subconscious, is elucidated. Arguments confirming that phenomenal experience is a result of cognitive processes are presented. Possible philosophical objections based on the Chinese room and other arguments are discussed, but they are insufficient to refute claims articon’s claims. Conditions for genuine understanding that go beyond the Turing test are presented. Articons may fulfill such conditions and in principle the structure of their experiences may be arbitrarily close to human

CiteSeerX

CogPrints Cognitive Sciences Eprint Archive

Chinese localisation of Evergreen: an open source integrated library system

Author: Guoying Liu
Zou Qing
Publication venue: Scholarship at UWindsor
Publication date: 01/01/2009
Field of study

Purpose - The purpose of this paper is to investigate various issues related to Chinese language localisation in Evergreen, an open source integrated library system (ILS). Design/methodology/approach - A Simplified Chinese version of Evergreen was implemented and tested and various issues such as encoding, indexing, searching, and sorting specifically associated with Simplified Chinese language were investigated. Findings - The paper finds that Unicode eases a lot of ILS development problems. However, having another language version of an ILS does not simply require the translation from one language to another. Indexing, searching, sorting and other locale related issues should be tackled not only language by language, but locale by locale. Practical implications - Most of the issues that have arisen during this project will be found with other ILS-like systems. Originality/value - This paper provides insights into issues of, and various solutions to, indexing, searching, and sorting in the Chinese language in an ILS. These issues and the solutions may be applicable to other digital library systems such as institutional repositories

Scholarship at UWindsor

A structural query system for Han characters

Author: Skala Matthew
Publication venue
Publication date: 01/01/2016
Field of study

The IT University of Copenhagen's Repository

Punctuation effects in English and Esperanto texts

Author: Ausloos
Boulton
Carroll
Carroll
Chomsky
Crane
Dalkilic
Dzurjuk
Ebeling
Erlich
Gabaix
Ha
Hatzigeorgiu
Ishida
Kanter
Kassel
Kawamura
Kosmidis
Koutsoudas
Köhler
Lambiotte
M. Ausloos
Mandelbrot
Mandelbrot
Mandelbrot
Meadow
Meyer
Mikros
Montemurro
Rottmann
Rousseau
Vilenski
Wang
Weisle
West
Wilson
Yule
Zipf
Publication venue: 'Elsevier BV'
Publication date: 01/01/2010
Field of study

A statistical physics study of punctuation effects on sentence lengths is presented for written texts: {\it Alice in wonderland} and {\it Through a looking glass}. The translation of the first text into esperanto is also considered as a test for the role of punctuation in defining a style, and for contrasting natural and artificial, but written, languages. Several log-log plots of the sentence length-rank relationship are presented for the major punctuation marks. Different power laws are observed with characteristic exponents. The exponent can take a value much less than unity (

ca.

0.50 or 0.30) depending on how a sentence is defined. The texts are also mapped into time series based on the word frequencies. The quantitative differences between the original and translated texts are very minutes, at the exponent level. It is argued that sentences seem to be more reliable than word distributions in discussing an author style.Comment: 13 pages, 7 figures (3x2+1), 60 reference

arXiv.org e-Print Archive

Crossref

Open Repository and Bibliography - Liège