Search CORE

2,242 research outputs found

Citation practice in the whole TESOL master’s theses by Vietnamese postgraduates

Author: Issra Pramoolsook
Nguyen Thi Thuy Loan
Publication venue: 'Penerbit Universiti Kebangsaan Malaysia (UKM Press)'
Publication date: 01/01/2016
Field of study

Citing previous works is an important rhetorical feature of academic writing and it is challenging for novice writers, especially non-native English writers (NNEWs). However, little is known about how NNEWs cite in each chapter of their master’s (M.A.) theses. This paper thus reports on the citation practice in 24 TESOL M.A. theses written by Vietnamese students. Citation types were first searched on the Antconc software with the use of the Regular Expressions (Regex) written for both conventional and ‘invented’ citing ways by this group of writers, and then based on Thompson and Tribble’s (2001) framework, citation functions were investigated and classified. Semi-structured interviews were also conducted with thesis writers and thesis supervisors. Besides the general citation practice by this group of NNEWs, and the different citation functions and types in different chapters of their theses, the study also found that these writers were not fully aware of the significance of citations as a rhetorical device in their thesis writing, and insufficient attention was paid to the in-text citations in the TESOL discourse community in Vietnam. These findings suggest explicit instructions on citations in order to help novice writers to fully acquire the citation use

UKM Journal Article Repository

The Effect of Thesis Writing on Paraphrasing Ability of the EFL Alumni of the University of Mataram Lombok

Author: Arifuddin Arifuddin
Publication venue: 'Yayasan Visi Intan Permata (Centrall)'
Publication date: 15/11/2021
Field of study

Until recently, no study focused on analyzing the effect thesis writing program on paraphrasing ability of the alumni. Generally, some studies focused on the reversed direction, that is, the effect of paraphrasing ability on thesis writing. This is the novelty of the present study. The present study aimed at testing the effect of thesis writing program at the end of the EFL study on paraphrasing ability of the alumni an EFL education, identifying the types of paraphrasing, and exploring weaknesses in paraphrasing and causes of not paraphrasing. This evaluative ex-post facto research employed Mixed-methods. The participants were 68 alumni of the University of Mataram Indonesia, those who undertook thesis writing program during their study in EFL education and the others who did not write undergraduate thesis. They were selected purposively from 37 schools in West Nusa Tenggara province. Data were collected with writing tasks, questionnaire, interview, and recording. The data were analyzed quantitatively and qualitatively. It shows: 1) The level of the alumni’s paraphrasing ability is ‘medium’; 2) Thesis writing program affects paraphrasing ability of the EFL alumni; 3) Synonym and Change of Word Orders are the dominant techniques; 4) The teachers’ weaknesses involve lack of vocabulary, limited conversions, deviation from the authentic ideas, summarizing, and unclear paraphrasing, 5) The causes of not paraphrasing include limited knowledge of paraphrasing and grammatical understanding. It is suggested that teacher education institutions implement curriculums that support teachers’ writing skills. In turn, plagiarism could be minimized which leads to the production of teachers’ quality academic writing

Indonesian Journal of EFL and Linguistics

MEGA: Multilingual Evaluation of Generative AI

Author: Ahuja Kabir
Axmed Maxamed
Bali Kalika
Diddee Harshita
Ganu Tanuja
Hada Rishav
Jain Prachi
Nambi Akshay
Ochieng Millicent
Ramesh Krithika
Segal Sameer
Sitaram Sunayana
Publication venue
Publication date: 22/10/2023
Field of study

Generative AI models have shown impressive performance on many Natural Language Processing tasks such as language understanding, reasoning, and language generation. An important question being asked by the AI community today is about the capabilities and limits of these models, and it is clear that evaluating generative AI is very challenging. Most studies on generative LLMs have been restricted to English and it is unclear how capable these models are at understanding and generating text in other languages. We present the first comprehensive benchmarking of generative LLMs - MEGA, which evaluates models on standard NLP benchmarks, covering 16 NLP datasets across 70 typologically diverse languages. We compare the performance of generative LLMs including Chat-GPT and GPT-4 to State of the Art (SOTA) non-autoregressive models on these tasks to determine how well generative models perform compared to the previous generation of LLMs. We present a thorough analysis of the performance of models across languages and tasks and discuss challenges in improving the performance of generative LLMs on low-resource languages. We create a framework for evaluating generative LLMs in the multilingual setting and provide directions for future progress in the field.Comment: EMNLP 202

arXiv.org e-Print Archive

“All These Nouns Together Just Don’t Make Sense!”: An Investigation of EAP Students’ Challenges with Complex Noun Phrases in First-Year College-Level Textbooks

Author: Priven Dmitri
Publication venue: Canadian Association of Applied Linguistics / Association canadienne de linguistique appliquée
Publication date: 10/07/2020
Field of study

Complex noun phrases (CNP) are a major vehicle of academic written discourse (Halliday, 1988; 2004). However, in spite of the view that they pose significant challenges to English language learners, they are often overlooked in preparatory English for Academic Purposes (EAP) programs. This mixed methods study aims to investigate to what extent CNP present syntactic parsing challenges for upper-level college EAP students, and whether there is a perceived need for direct instruction in CNP in EAP programs. A special CNP proficiency test was administered to 70 upper-level Ontario college EAP students and a native speaker comparator group, and the results were compared with those obtained from interviews with seven of the test-takers. The results obtained from the statistical analyses and the interviews indicate that CNP are challenging to parse for upper-level EAP students and that direct instruction in CNP may be beneficial for improving their reading comprehension. Some teaching implications of the findings are also addressed.Les groupes nominaux complexes (GNC) sont un vecteur important du discours écrit universitaire (Halliday, 1988; 2004). Cependant, bien qu’on admette les difficultés qu’ils posent aux apprenant.e.s d’anglais, les GNC sont souvent peu pris en compte par les programmes préparatoires d'anglais sur objectifs universitaires (English for Academic Purposes ou EAP). Cette étude à méthodologie mixte vise à déterminer dans quelle mesure a) les GNC présentent des défis d'analyse syntaxique pour les étudiant.e.s de l’enseignement collégial postsecondaire inscrit.e.s à des cours EAP avancés, et b) un enseignement explicite des GNC est perçu comme nécessaire. Un test de compétence spécifique aux GNC a été administré à 70 étudiant.e.s de cours EAP avancés d’un collège de l'Ontario et à un groupe comparatif composé de locuteurs natifs; les résultats au test ont été triangulés par le moyen d’entretiens avec sept participants. Les résultats obtenus à partir des analyses statistiques des tests et des entretiens indiquent que les GNC sont difficiles à analyser pour les étudiant.e.s des cours EAP avancés, et que l'enseignement explicite des GNC pourrait permettre d’améliorer leur compréhension en lecture. Des pistes pédagogiques découlant des résultats sont également abordées

University of New Brunswick: Centre for Digital Scholarship Journals

NeCo@ALQAC 2023: Legal Domain Knowledge Acquisition for Low-Resource Languages through Data Enrichment

Author: Nguyen Dieu-Quynh
Nguyen Ha-Thanh
Nguyen Hai-Long
Nguyen Hoang-Trung
Nguyen Huu-Dong
Nguyen Thach-Anh
Pham Thu-Trang
Vuong Thi-Hai-Yen
Publication venue
Publication date: 11/09/2023
Field of study

In recent years, natural language processing has gained significant popularity in various sectors, including the legal domain. This paper presents NeCo Team's solutions to the Vietnamese text processing tasks provided in the Automated Legal Question Answering Competition 2023 (ALQAC 2023), focusing on legal domain knowledge acquisition for low-resource languages through data enrichment. Our methods for the legal document retrieval task employ a combination of similarity ranking and deep learning models, while for the second task, which requires extracting an answer from a relevant legal article in response to a question, we propose a range of adaptive techniques to handle different question types. Our approaches achieve outstanding results on both tasks of the competition, demonstrating the potential benefits and effectiveness of question answering systems in the legal field, particularly for low-resource languages.Comment: ISAILD@KSE 202

arXiv.org e-Print Archive

What are Automated Paraphrasing Tools and how do we address them? A review of a growing threat to academic integrity

Author: Perkins Mike
Roe Jasper
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2022
Field of study

This article reviews the literature surrounding the growing use of Automated Paraphrasing Tools (APTs) as a threat to educational integrity. In academia there is a technological arms-race occurring between the development of tools and techniques which facilitate violations of the principles of educational integrity, including text-based plagiarism, and methods for identifying such behaviors. APTs are part of this race, as they are a rapidly developing technology which can help writers transform words, phrases, and entire sentences and paragraphs at the click of a button. This article seeks to review the literature surrounding the history of APT use and the current understanding of APTs placed in the broader context of the educational integrity-technology arms race

ResearchOnline at James Cook University

Recommended from our members

Effective and Efficient Transfer Learning in the Era of Large Language Models

Author: Vu Tu
Publication venue: ScholarWorks@UMass Amherst
Publication date: 14/11/2023
Field of study

Substantial progress has been made in the field of natural language processing (NLP) due to the advent of large language models (LLMs)—deep neural networks with millions or billions of parameters pre-trained on large amounts of unlabeled data. However, these models have common weaknesses, including degenerate performance in data-scarce scenarios, and substantial computational resource requirements. This thesis aims to develop methods to address these limitations for improved applicability and performance of LLMs in resource-constrained settings with limited data and/or computational resources. To address the need for labeled data in data-scarce scenarios, I present two methods, in Chapter 2 and Chapter 3, respectively. The first method leverages beneficial relationships between NLP tasks for transfer learning, while the second method combines data augmentation and self-training to boost few-shot learning performance—the ability to perform novel tasks from only a few labeled examples. Additionally, in Chapter 4, I introduce a novel parameter-efficient transfer learning approach that reuses a single frozen model for all tasks while only learning minimal task-specific parameters (soft/continuous prompts) to represent tasks and transfer knowledge. Our method can match or outperform fine-tuning task-specific models (training the whole model on each task). In Chapter 5, I demonstrate the benefits of parameter-efficient transfer learning in a cross-lingual transfer setting. Finally, I conclude the thesis in Chapter 6 by outlining potential avenues for future research that aim to advance NLP through large-scale multi-task learning using multilingual and multimodal data

ScholarWorks@UMass Amherst

On the Cross-lingual Transferability of Monolingual Representations

Author: Artetxe Mikel
Ruder Sebastian
Yogatama Dani
Publication venue
Publication date: 01/01/2020
Field of study

State-of-the-art unsupervised multilingual models (e.g., multilingual BERT) have been shown to generalize in a zero-shot cross-lingual setting. This generalization ability has been attributed to the use of a shared subword vocabulary and joint training across multiple languages giving rise to deep multilingual abstractions. We evaluate this hypothesis by designing an alternative approach that transfers a monolingual model to new languages at the lexical level. More concretely, we first train a transformer-based masked language model on one language, and transfer it to a new language by learning a new embedding matrix with the same masked language modeling objective, freezing parameters of all other layers. This approach does not rely on a shared vocabulary or joint training. However, we show that it is competitive with multilingual BERT on standard cross-lingual classification benchmarks and on a new Cross-lingual Question Answering Dataset (XQuAD). Our results contradict common beliefs of the basis of the generalization ability of multilingual models and suggest that deep monolingual models learn some abstractions that generalize across languages. We also release XQuAD as a more comprehensive cross-lingual benchmark, which comprises 240 paragraphs and 1190 question-answer pairs from SQuAD v1.1 translated into ten languages by professional translators.Comment: ACL 202

arXiv.org e-Print Archive

Crossref