7,585 research outputs found

    Writing and literacy in Indonesia

    Get PDF
    published or submitted for publicationis peer reviewe

    ON MONITORING LANGUAGE CHANGE WITH THE SUPPORT OF CORPUS PROCESSING

    Get PDF
    One of the fundamental characteristics of language is that it can change over time. One method to monitor the change is by observing its corpora: a structured language documentation. Recent development in technology, especially in the field of Natural Language Processing allows robust linguistic processing, which support the description of diverse historical changes of the corpora. The interference of human linguist is inevitable as it determines the gold standard, but computer assistance provides considerable support by incorporating computational approach in exploring the corpora, especially historical corpora. This paper proposes a model for corpus development, where corpus are annotated to support further computational operations such as lexicogrammatical pattern matching, automatic retrieval and extraction. The corpus processing operations are performed by local grammar based corpus processing software on a contemporary Indonesian corpus. This paper concludes that data collection and data processing in a corpus are equally crucial importance to monitor language change, and none can be set aside

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Translation into any natural language of the error messages generated by any computer program

    Full text link
    Since the introduction of the Fortran programming language some 60 years ago, there has been little progress in making error messages more user-friendly. A first step in this direction is to translate them into the natural language of the students. In this paper we propose a simple script for Linux systems which gives word by word translations of error messages. It works for most programming languages and for all natural languages. Understanding the error messages generated by compilers is a major hurdle for students who are learning programming, particularly for non-native English speakers. Not only may they never become "fluent" in programming but many give up programming altogether. Whereas programming is a tool which can be useful in many human activities, e.g. history, genealogy, astronomy, entomology, in many countries the skill of programming remains confined to a narrow fringe of professional programmers. In all societies, besides professional violinists there are also amateurs. It should be the same for programming. It is our hope that once translated and explained the error messages will be seen by the students as an aid rather than as an obstacle and that in this way more students will enjoy learning and practising programming. They should see it as a funny game.Comment: 14 pages, 1 figur

    The BURCHAK corpus: a Challenge Data Set for Interactive Learning of Visually Grounded Word Meanings

    Full text link
    We motivate and describe a new freely available human-human dialogue dataset for interactive learning of visually grounded word meanings through ostensive definition by a tutor to a learner. The data has been collected using a novel, character-by-character variant of the DiET chat tool (Healey et al., 2003; Mills and Healey, submitted) with a novel task, where a Learner needs to learn invented visual attribute words (such as " burchak " for square) from a tutor. As such, the text-based interactions closely resemble face-to-face conversation and thus contain many of the linguistic phenomena encountered in natural, spontaneous dialogue. These include self-and other-correction, mid-sentence continuations, interruptions, overlaps, fillers, and hedges. We also present a generic n-gram framework for building user (i.e. tutor) simulations from this type of incremental data, which is freely available to researchers. We show that the simulations produce outputs that are similar to the original data (e.g. 78% turn match similarity). Finally, we train and evaluate a Reinforcement Learning dialogue control agent for learning visually grounded word meanings, trained from the BURCHAK corpus. The learned policy shows comparable performance to a rule-based system built previously.Comment: 10 pages, THE 6TH WORKSHOP ON VISION AND LANGUAGE (VL'17

    GenERRate: generating errors for use in grammatical error detection

    Get PDF
    This paper explores the issue of automatically generated ungrammatical data and its use in error detection, with a focus on the task of classifying a sentence as grammatical or ungrammatical. We present an error generation tool called GenERRate and show how GenERRate can be used to improve the performance of a classifier on learner data. We describe initial attempts to replicate Cambridge Learner Corpus errors using GenERRate

    An examination of the suitability of a pluricentric model of english language teaching for primary education in Indonesia

    Get PDF
    The study examined the suitability of a pluricentric model of ELT, which accommodates local varieties of English, for primary education in Indonesia. The majority of participants in the study strongly supported the adoption of a pluricentric model of English language instruction. However, whether their positive attitudes would affect ELT pedagogy was not clear, since there were many complex issues impacting on the adoption of this approach in Indonesia

    On the Web Communication Assist Aide based on the Bilingual Sign Language Dictionary

    Get PDF
    PACLIC 19 / Taipei, taiwan / December 1-3, 200

    The Pronunciation Problems among Kurdish Learners of English

    Get PDF
    The goal of this study was to examine the pronunciation issues of different speakers of English and especially Kurdish speakers, and various perspectives on native vs foreign pronunciations. The research showed that Kurdish speakers had difficulties pronouncing several English vowels and some English consonants. The research results demonstrate that Kurdish English speakers understand the value of pronunciation compared to native and non-native English speakers. Kurdish speakers may hesitate to speak in a manner that seems natural to a native speaker, and their last consonants in words are almost always unaspirated and unvoiced. Given that Kurdish learners of English have difficulty pronouncing some English words, some suggested solutions include providing pronunciation instruction classes to language instructors, having educators speak in English, and giving students examples of native tongue sounds compared and contrasted with the target language sounds. With minimal exposure to cooperation with native speakers and variations in L1's phonological organization compared to English, the difficulty posed by pronunciation is evident. All the updated studies clearly show that these issues affect English speakers in general and rely less and less on their original tongue

    Multilinguals and Wikipedia Editing

    Full text link
    This article analyzes one month of edits to Wikipedia in order to examine the role of users editing multiple language editions (referred to as multilingual users). Such multilingual users may serve an important function in diffusing information across different language editions of the encyclopedia, and prior work has suggested this could reduce the level of self-focus bias in each edition. This study finds multilingual users are much more active than their single-edition (monolingual) counterparts. They are found in all language editions, but smaller-sized editions with fewer users have a higher percentage of multilingual users than larger-sized editions. About a quarter of multilingual users always edit the same articles in multiple languages, while just over 40% of multilingual users edit different articles in different languages. When non-English users do edit a second language edition, that edition is most frequently English. Nonetheless, several regional and linguistic cross-editing patterns are also present
    corecore