7,585 research outputs found
Writing and literacy in Indonesia
published or submitted for publicationis peer reviewe
ON MONITORING LANGUAGE CHANGE WITH THE SUPPORT OF CORPUS PROCESSING
One of the fundamental characteristics of language is that it can change over time. One
method to monitor the change is by observing its corpora: a structured language
documentation. Recent development in technology, especially in the field of Natural
Language Processing allows robust linguistic processing, which support the description of
diverse historical changes of the corpora. The interference of human linguist is inevitable as
it determines the gold standard, but computer assistance provides considerable support by
incorporating computational approach in exploring the corpora, especially historical
corpora. This paper proposes a model for corpus development, where corpus are annotated
to support further computational operations such as lexicogrammatical pattern matching,
automatic retrieval and extraction. The corpus processing operations are performed by local
grammar based corpus processing software on a contemporary Indonesian corpus. This
paper concludes that data collection and data processing in a corpus are equally crucial
importance to monitor language change, and none can be set aside
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Translation into any natural language of the error messages generated by any computer program
Since the introduction of the Fortran programming language some 60 years ago,
there has been little progress in making error messages more user-friendly. A
first step in this direction is to translate them into the natural language of
the students. In this paper we propose a simple script for Linux systems which
gives word by word translations of error messages. It works for most
programming languages and for all natural languages. Understanding the error
messages generated by compilers is a major hurdle for students who are learning
programming, particularly for non-native English speakers. Not only may they
never become "fluent" in programming but many give up programming altogether.
Whereas programming is a tool which can be useful in many human activities,
e.g. history, genealogy, astronomy, entomology, in many countries the skill of
programming remains confined to a narrow fringe of professional programmers. In
all societies, besides professional violinists there are also amateurs. It
should be the same for programming. It is our hope that once translated and
explained the error messages will be seen by the students as an aid rather than
as an obstacle and that in this way more students will enjoy learning and
practising programming. They should see it as a funny game.Comment: 14 pages, 1 figur
The BURCHAK corpus: a Challenge Data Set for Interactive Learning of Visually Grounded Word Meanings
We motivate and describe a new freely available human-human dialogue dataset
for interactive learning of visually grounded word meanings through ostensive
definition by a tutor to a learner. The data has been collected using a novel,
character-by-character variant of the DiET chat tool (Healey et al., 2003;
Mills and Healey, submitted) with a novel task, where a Learner needs to learn
invented visual attribute words (such as " burchak " for square) from a tutor.
As such, the text-based interactions closely resemble face-to-face conversation
and thus contain many of the linguistic phenomena encountered in natural,
spontaneous dialogue. These include self-and other-correction, mid-sentence
continuations, interruptions, overlaps, fillers, and hedges. We also present a
generic n-gram framework for building user (i.e. tutor) simulations from this
type of incremental data, which is freely available to researchers. We show
that the simulations produce outputs that are similar to the original data
(e.g. 78% turn match similarity). Finally, we train and evaluate a
Reinforcement Learning dialogue control agent for learning visually grounded
word meanings, trained from the BURCHAK corpus. The learned policy shows
comparable performance to a rule-based system built previously.Comment: 10 pages, THE 6TH WORKSHOP ON VISION AND LANGUAGE (VL'17
GenERRate: generating errors for use in grammatical error detection
This paper explores the issue of automatically generated ungrammatical data and its use in error detection, with a focus on the task of classifying a sentence as grammatical or ungrammatical. We present an error generation tool called GenERRate and show how GenERRate can be used to improve the performance of a classifier on learner data. We describe
initial attempts to replicate Cambridge Learner Corpus errors using GenERRate
An examination of the suitability of a pluricentric model of english language teaching for primary education in Indonesia
The study examined the suitability of a pluricentric model of ELT, which accommodates local varieties of English, for primary education in Indonesia. The majority of participants in the study strongly supported the adoption of a pluricentric model of English language instruction. However, whether their positive attitudes would affect ELT pedagogy was not clear, since there were many complex issues impacting on the adoption of this approach in Indonesia
On the Web Communication Assist Aide based on the Bilingual Sign Language Dictionary
PACLIC 19 / Taipei, taiwan / December 1-3, 200
The Pronunciation Problems among Kurdish Learners of English
The goal of this study was to examine the pronunciation issues of different speakers of English and especially Kurdish speakers, and various perspectives on native vs foreign pronunciations. The research showed that Kurdish speakers had difficulties pronouncing several English vowels and some English consonants. The research results demonstrate that Kurdish English speakers understand the value of pronunciation compared to native and non-native English speakers. Kurdish speakers may hesitate to speak in a manner that seems natural to a native speaker, and their last consonants in words are almost always unaspirated and unvoiced. Given that Kurdish learners of English have difficulty pronouncing some English words, some suggested solutions include providing pronunciation instruction classes to language instructors, having educators speak in English, and giving students examples of native tongue sounds compared and contrasted with the target language sounds. With minimal exposure to cooperation with native speakers and variations in L1's phonological organization compared to English, the difficulty posed by pronunciation is evident. All the updated studies clearly show that these issues affect English speakers in general and rely less and less on their original tongue
Multilinguals and Wikipedia Editing
This article analyzes one month of edits to Wikipedia in order to examine the
role of users editing multiple language editions (referred to as multilingual
users). Such multilingual users may serve an important function in diffusing
information across different language editions of the encyclopedia, and prior
work has suggested this could reduce the level of self-focus bias in each
edition. This study finds multilingual users are much more active than their
single-edition (monolingual) counterparts. They are found in all language
editions, but smaller-sized editions with fewer users have a higher percentage
of multilingual users than larger-sized editions. About a quarter of
multilingual users always edit the same articles in multiple languages, while
just over 40% of multilingual users edit different articles in different
languages. When non-English users do edit a second language edition, that
edition is most frequently English. Nonetheless, several regional and
linguistic cross-editing patterns are also present
- …