16 research outputs found

    Segmental and Suprasegmental Mispronunciations Made by EFL Learners in Indonesia

    Get PDF
    This study reveals segmental and suprasegmental mispronunciations of some words made by Indonesian learners of EFL and their possible causes. This is descriptive qualitative research. The phonological data were elicited by asking the subjects to read loudly while recorded the text that had been prepared to contain words with segmental and suprasegmental phonemes. The data were limited to those words existing in the eliciting text. The records were then transcribed, identified, coded, classified, and interpreted. The possible causes that might have influenced the mispronunciations made by the subjects were unfolded using the contrastive analysis approach and psycholinguistic theories. The study reveals that the mispronunciations in segmental phonemes include vowels and diphthongs, such as /i:/, /ɑ:/, /eɪ/, /oʊ/, consonants /z/, /v/, /ð/, and silent letters w, l, and s. The suprasegmental mispronunciations are in the use of stress on multi-syllabic words. Finally, this study concludes the possible causes that made these errors happen, among others, are interlingual differences, mother tongue interference, shortage of knowledge, and fossilization

    Simple Yet Powerful Native Language Identification on TOEFL11

    Get PDF
    Abstract Native language identification (NLI) is the task to determine the native language of the author based on an essay written in a second language. NLI is often treated as a classification problem. In this paper, we use the TOEFL11 data set which consists of more data, in terms of the amount of essays and languages, and less biased across prompts, i.e., topics, of essays. We demonstrate that even using word level n-grams as features, and support vector machine (SVM) as a classifier can yield nearly 80% accuracy. We observe that the accuracy of a binary-based word level ngram representation (~80%) is much better than the performance of a frequency-based word level n-gram representation (~20%). Notably, comparable results can be achieved without removing punctuation marks, suggesting a very simple baseline system for NLI

    Contrastive analysis of english diphthongs pronounced by banjarese students of IAIN Palangka Raya

    Get PDF
    This study was aimed at investigating: (1) the differences and similarities of diphthong in English and Banjar Language, (2) the banjarese students pronounce diphthong between English and Banjar language, (3) the factors which affect their pronunciation. The research design was contrastive analysis in qualitative research method. The data were collected from students’ pronunciation recording and interview as documentations. The subject of the study was 9 Banjarese Students of 5th semester of English Education Study Program at IAIN Palangka Raya on Academic Year 2016/2017. The data was analyzed by using techniques: data collection, data reduction, data display and verification. For data trustworthiness, the researcher used triangulation data source. In the result findings, researcher found the number of dipthongs in English and BBK languages are different each other. There are 8 diphthongs in English they are /eI/, /əʊ/, /aɪ/, /ɑʊ/, /ɔɪ/, /ɪə/, /ʊə/, /ɛə/ and there are 3 diphthongs in BBK they are /au/, /ai/, /ui/. The Banjarese students have difficulties dealing with diphthongs that not recognized in their native language. For some English diphthongs that not available in BBK diphthongs, the students tended to take the substitution from the pure vowels found in their native language and some other are pure vowels found in their Indonesian language. The Banjarese students replaced some of English diphthong to pure vowels when they pronounced it. There are 5 English Diphthongs which mispronounced by them : /əʊ/, /eɪ/, /ʊə/, /ɪə/ and /ɛə/. The diphthongs are replaced to: /eɪ/ /e/, /ɛə/ /ʌ/, /ʊə/ /u/, /əʊ/ ooo/ɒ/, /ɪə/ /i:/. There are three factors that affecting the banjarese students pronunciation, they are age, exposure and mother tongue influence. ABSTRAK Penelitian ini bertujuan untuk menyelidiki: (1) perbedaan dan persamaan diftong dalam Bahasa Inggris dan Bahasa Banjar, (2) siswa banjar mengucapkan diftong antara bahasa Inggris dan bahasa Banjar, (3) faktor-faktor yang mempengaruhi pengucapan mereka. Desain penelitian adalah analisis kontrastif dalam metode penelitian kualitatif. Data dikumpulkan dari rekaman pengucapan siswa dan wawancara sebagai dokumentasi. Subyek penelitian ini adalah 9 mahasiswa Banjar semester 5 Program Studi Pendidikan Bahasa Inggris di IAIN Palangka Raya pada Tahun Akademik 2016/2017. Data dianalisis dengan menggunakan teknik: pengumpulan data, reduksi data, tampilan data dan verifikasi. Untuk keabsahan data, peneliti menggunakan sumber data triangulasi. Dalam hasil temuan, peneliti menemukan jumlah dipthong dalam bahasa Inggris dan BBK berbeda satu sama lain. Ada 8 diftong dalam bahasa Inggris yaitu /eI/, /əʊ/, /aɪ/, /ɑʊ/, /ɔɪ/, /ɪə/, /ʊə/, /ɛə/ dan ada 3 diftong di BBK yaitu /au/, /ai/, /ui/. Para siswa Banjar mengalami kesulitan berurusan dengan diftong yang tidak tersedia dalam bahasa asli mereka. Untuk beberapa diftong bahasa Inggris yang tidak tersedia di BBK diftong, para siswa cenderung mengambil substitusi dari vokal murni yang ditemukan dalam bahasa asli mereka dan beberapa lainnya adalah vokal murni yang ditemukan dalam bahasa Indonesia mereka. Siswa-siswa Banjar mengganti beberapa diftong bahasa Inggris menjadi vokal murni ketika mereka mengucapkannya. Ada 5 diftong bahasa Inggris yang salah diucapkan oleh mereka: / əʊ /, / eɪ /, / ʊə /, / ɪə / dan / ɛə /. Diftong diganti menjadi: /eɪ/ mm/e/, /ɛə/ /ʌ/, /ʊə/ /u/, /əʊ/ /ɒ/, /ɪə/ /i:/. Ada tiga faktor yang mempengaruhi pengucapan siswa banjar, yaitu usia, keterpaparan dan pengaruh bahasa ibu

    Linguistic identifiers of L1 Persian speakers writing in English:NLID for authorship analysis

    Get PDF
    This research focuses on Native Language Identification (NLID), and in particular, on the linguistic identifiers of L1 Persian speakers writing in English. This project comprises three sub-studies; the first study devises a coding system to account for interlingual features present in a corpus of L1 Persian speakers blogging in English, and a corpus of L1 English blogs. Study One then demonstrates that it is possible to use interlingual identifiers to distinguish authorship by L1 Persian speakers. Study Two examines the coding system in relation to the L1 Persian corpus and a corpus of L1 Azeri and L1 Pashto speakers. The findings of this section indicate that the NLID method and features designed are able to discriminate between L1 influences from different languages. Study Three focuses on elicited data, in which participants were tasked with disguising their language to appear as L1 Persian speakers writing in English. This study indicated that there was a significant difference between the features in the L1 Persian corpus, and the corpus of disguise texts. The findings of this research indicate that NLID and the coding system devised have a very strong potential to aid forensic authorship analysis in investigative situations. Unlike existing research, this project focuses predominantly on blogs, as opposed to student data, making the findings more appropriate to forensic casework data

    Second language learning from a multilingual perspective

    Get PDF
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student-submitted PDF version of thesis.Includes bibliographical references (pages 119-127).How do people learn a second language? In this thesis, we study this question through an examination of cross-linguistic transfer: the role of a speaker's native language in the acquisition, representation, usage and processing of a second language. We present a computational framework that enables studying transfer in a unified fashion across language production and language comprehension. Our framework supports bidirectional inference between linguistic characteristics of speakers' native languages, and the way they use and process a new language. We leverage this inference ability to demonstrate the systematic nature of cross-linguistic transfer, and to uncover some of its key linguistic and cognitive manifestations. We instantiate our framework in language production by relating syntactic usage patterns and grammatical errors in English as a Second Language (ESL) to typological properties of the native language, showing its utility for automated typology learning and prediction of second language grammatical errors. We then introduce eye tracking during reading as a methodology for studying cross-linguistic transfer in second language comprehension. Using this methodology, we demonstrate that learners' native language can be predicted from their eye movement while reading free-form second language text. Further, we show that language processing during second language comprehension is intimately related to linguistic characteristics of the reader's first language. Finally, we introduce the Treebank of Learner English (TLE), the first syntactically annotated corpus of learner English. The TLE is annotated with Universal Dependencies (UD), a framework geared towards multilingual language analysis, and will support linguistic and computational research on learner language. Taken together, our results highlight the importance of multilingual approaches to the scientific study of second language acquisition, and to Natural Language Processing (NLP) applications for non-native language.by Yevgeni Berzak.Ph. D

    Beyond topic-based representations for text mining

    Get PDF
    A massive amount of online information is natural language text: newspapers, blog articles, forum posts and comments, tweets, scientific literature, government documents, and more. While in general, all kinds of online information is useful, textual information is especially important—it is the most natural, most common, and most expressive form of information. Text representation plays a critical role in application tasks like classification or information retrieval since the quality of the underlying feature space directly impacts each task's performance. Because of this importance, many different approaches have been developed for generating text representations. By far, the most common way to generate features is to segment text into words and record their n-grams. While simple term features perform relatively well in topic-based tasks, not all downstream applications are of a topical nature and can be captured by words alone. For example, determining the native language of an English essay writer will depend on more than just word choice. Competing methods to topic-based representations (such as neural networks) are often not interpretable or rely on massive amounts of training data. This thesis proposes three novel contributions to generate and analyze a large space of non-topical features. First, structural parse tree features are solely based on structural properties of a parse tree by ignoring all of the syntactic categories in the tree. An important advantage of these "skeletons" over regular syntactic features is that they can capture global tree structures without causing problems of data sparseness or overfitting. Second, SyntacticDiff explicitly captures differences in a text document with respect to a reference corpus, creating features that are easily explained as weighted word edit differences. These edit features are especially useful since they are derived from information not present in the current document, capturing a type of comparative feature. Third, Cross-Context Lexical Analysis is a general framework for analyzing similarities and differences in both term meaning and representation with respect to different, potentially overlapping partitions of a text collection. The representations analyzed by CCLA are not limited to topic-based features

    Language transfer in second language acquisition. Some effects of L1 instruction (Romanian) on L2/L3 learning (Catalan/Spanish)

    Get PDF
    In migration contexts, the diversity of languages in contact triggers the processes of second language (L2) acquisition and language transfer; as well as drawing attention to the importance of mother tongue (L1) maintenance. The present study examines the processes of L2 acquisition (Catalan and Spanish), L1 (Romanian) maintenance, and L1transfer, in the case of 130 immigrant Romanian students, as well as the effect of attendance at L1 classes and length of residence on the three languages analysed. Accordingly, three parallel language competence tests were applied in seven public schools of Compulsory Secondary Education in Catalonia. Generally, the results indicate that the language transfer from the L1 to the L2s occurs and a longer length of residence facilitates the learning of Catalan and Spanish, but, at the same time, hinders the level of competence in L1. Also, attendance at Romanian classes seems to influence the maintenance of the mother tongue and the acquisition of the second languages.En contextos de migració, la diversitat de llengües en contacte esdevé processos d’adquisició de segones llengües (L2) i de transferència lingüística; a més de revifar el debat sobre la importància del manteniment de la llengua materna (L1). En el següent treball s’exploren els processos d’adquisició de l’L2 (català i castellà), del manteniment de l’L1 (romanès) i de la transferència lingüística de l’L1, de 130 estudiants immigrants d’origen romanès; així com l’efecte d’assistir a classes d’L1 i el temps d’estada, en les tres llengües estudiades. Per a aquest propòsit, s’han aplicat tres proves paral•leles de competència lingüística en set instituts d’Educació Secundària Obligatòria de Catalunya. A nivell general, els resultats indiquen que la transferència lingüística de l’L1 a les L2s sorgeix i que un major temps d’estada afavoreix l’aprenentatge del català i del castellà però, al mateix temps, va en detriment del nivell del coneixement adquirit en la seva L1. Així mateix, l’assistència a classes de romanès sembla influir en el manteniment de la seva llengua materna i en l’aprenentatge de segones llengües.En contextos de migración, la diversidad de lenguas en contacto desencadena procesos de adquisición de segundas lenguas (L2) y de transferencia lingüística; además de reavivar el debate sobre la importancia del mantenimiento de la lengua materna (L1). En el siguiente trabajo se exploran los procesos de adquisición de L2 (catalán y castellano), del mantenimiento de la L1 (rumano) y de la transferencia lingüística de la L1, de 130 estudiantes inmigrantes de origen rumano, así como el efecto de asistir a clases de L1 y el tiempo de estancia, en las tres lenguas estudiadas. Para ello, se han aplicado tres pruebas paralelas de competencia lingüística en siete institutos de Educación Secundaria Obligatoria de Cataluña. A nivel general, los resultados indican que se da la influencia de la L1 en las L2 y que un mayor tiempo de estancia favorece el aprendizaje del catalán y del castellano, pero, a su vez, va en detrimento del nivel de conocimiento adquirido en su L1. Asimismo, la asistencia a clases de rumano parece influir en el mantenimiento de su lengua materna y en el aprendizaje de segundas lenguas
    corecore