23,567 research outputs found
Optimizing the Learning Order of Chinese Characters Using a Novel Topological Sort Algorithm
We present a novel algorithm for optimizing the order in which Chinese
characters are learned, one that incorporates the benefits of learning them in
order of usage frequency and in order of their hierarchal structural
relationships. We show that our work outperforms previously published orders
and algorithms. Our algorithm is applicable to any scheduling task where nodes
have intrinsic differences in importance and must be visited in topological
order
New Perspectives in Sinographic Language Processing Through the Use of Character Structure
Chinese characters have a complex and hierarchical graphical structure
carrying both semantic and phonetic information. We use this structure to
enhance the text model and obtain better results in standard NLP operations.
First of all, to tackle the problem of graphical variation we define
allographic classes of characters. Next, the relation of inclusion of a
subcharacter in a characters, provides us with a directed graph of allographic
classes. We provide this graph with two weights: semanticity (semantic relation
between subcharacter and character) and phoneticity (phonetic relation) and
calculate "most semantic subcharacter paths" for each character. Finally,
adding the information contained in these paths to unigrams we claim to
increase the efficiency of text mining methods. We evaluate our method on a
text classification task on two corpora (Chinese and Japanese) of a total of 18
million characters and get an improvement of 3% on an already high baseline of
89.6% precision, obtained by a linear SVM classifier. Other possible
applications and perspectives of the system are discussed.Comment: 17 pages, 5 figures, presented at CICLing 201
Brain-inspired conscious computing architecture
What type of artificial systems will claim to be conscious and will claim to experience qualia? The ability to comment upon physical states of a brain-like dynamical system coupled with its environment seems to be sufficient to make claims. The flow of internal states in such system, guided and limited by associative memory, is similar to the stream of consciousness. Minimal requirements for an artificial system that will claim to be conscious were given in form of specific architecture named articon. Nonverbal discrimination of the working memory states of the articon gives it the ability to experience different qualities of internal states. Analysis of the inner state flows of such a system during typical behavioral process shows that qualia are inseparable from perception and action. The role of consciousness in learning of skills, when conscious information processing is replaced by subconscious, is elucidated. Arguments confirming that phenomenal experience is a result of cognitive processes are presented. Possible philosophical objections based on the Chinese room and other arguments are discussed, but they are insufficient to refute claims articon’s claims. Conditions for genuine understanding that go beyond the Turing test are presented. Articons may fulfill such conditions and in principle the structure of their experiences may be arbitrarily close to human
Chinese localisation of Evergreen: an open source integrated library system
Purpose - The purpose of this paper is to investigate various issues related to Chinese language localisation in Evergreen, an open source integrated library system (ILS).
Design/methodology/approach - A Simplified Chinese version of Evergreen was implemented and tested and various issues such as encoding, indexing, searching, and sorting specifically associated with Simplified Chinese language were investigated.
Findings - The paper finds that Unicode eases a lot of ILS development problems. However, having another language version of an ILS does not simply require the translation from one language to another. Indexing, searching, sorting and other locale related issues should be tackled not only language by language, but locale by locale.
Practical implications - Most of the issues that have arisen during this project will be found with other ILS-like systems.
Originality/value - This paper provides insights into issues of, and various solutions to, indexing, searching, and sorting in the Chinese language in an ILS. These issues and the solutions may be applicable to other digital library systems such as institutional repositories
Punctuation effects in English and Esperanto texts
A statistical physics study of punctuation effects on sentence lengths is
presented for written texts: {\it Alice in wonderland} and {\it Through a
looking glass}. The translation of the first text into esperanto is also
considered as a test for the role of punctuation in defining a style, and for
contrasting natural and artificial, but written, languages. Several log-log
plots of the sentence length-rank relationship are presented for the major
punctuation marks. Different power laws are observed with characteristic
exponents. The exponent can take a value much less than unity ( 0.50 or
0.30) depending on how a sentence is defined. The texts are also mapped into
time series based on the word frequencies. The quantitative differences between
the original and translated texts are very minutes, at the exponent level. It
is argued that sentences seem to be more reliable than word distributions in
discussing an author style.Comment: 13 pages, 7 figures (3x2+1), 60 reference
- …