17,591 research outputs found
MIRACLE Retrieval Experiments with East Asian Languages
This paper describes the participation of MIRACLE in NTCIR 2005 CLIR task. Although our group has a strong background and long expertise in Computational Linguistics and Information Retrieval applied to European languages and using Latin and Cyrillic alphabets, this was our first attempt on East Asian languages. Our main goal was to study the particularities and distinctive characteristics of Japanese, Chinese and Korean, specially focusing on the similarities and differences with European languages, and carry out research on CLIR tasks which include those languages. The basic idea behind our participation in NTCIR is to test if the same familiar linguisticbased techniques may also applicable to East Asian languages, and study the necessary adaptations
Translation into any natural language of the error messages generated by any computer program
Since the introduction of the Fortran programming language some 60 years ago,
there has been little progress in making error messages more user-friendly. A
first step in this direction is to translate them into the natural language of
the students. In this paper we propose a simple script for Linux systems which
gives word by word translations of error messages. It works for most
programming languages and for all natural languages. Understanding the error
messages generated by compilers is a major hurdle for students who are learning
programming, particularly for non-native English speakers. Not only may they
never become "fluent" in programming but many give up programming altogether.
Whereas programming is a tool which can be useful in many human activities,
e.g. history, genealogy, astronomy, entomology, in many countries the skill of
programming remains confined to a narrow fringe of professional programmers. In
all societies, besides professional violinists there are also amateurs. It
should be the same for programming. It is our hope that once translated and
explained the error messages will be seen by the students as an aid rather than
as an obstacle and that in this way more students will enjoy learning and
practising programming. They should see it as a funny game.Comment: 14 pages, 1 figur
The "handedness" of language: Directional symmetry breaking of sign usage in words
Language, which allows complex ideas to be communicated through symbolic
sequences, is a characteristic feature of our species and manifested in a
multitude of forms. Using large written corpora for many different languages
and scripts, we show that the occurrence probability distributions of signs at
the left and right ends of words have a distinct heterogeneous nature.
Characterizing this asymmetry using quantitative inequality measures, viz.
information entropy and the Gini index, we show that the beginning of a word is
less restrictive in sign usage than the end. This property is not simply
attributable to the use of common affixes as it is seen even when only word
roots are considered. We use the existence of this asymmetry to infer the
direction of writing in undeciphered inscriptions that agrees with the
archaeological evidence. Unlike traditional investigations of phonotactic
constraints which focus on language-specific patterns, our study reveals a
property valid across languages and writing systems. As both language and
writing are unique aspects of our species, this universal signature may reflect
an innate feature of the human cognitive phenomenon.Comment: 10 pages, 4 figures + Supplementary Information (15 pages, 8
figures), final corrected versio
- …