2,242 research outputs found
Citation practice in the whole TESOL masterâs theses by Vietnamese postgraduates
Citing previous works is an important rhetorical feature of academic writing and it is challenging for novice
writers, especially non-native English writers (NNEWs). However, little is known about how NNEWs cite in
each chapter of their masterâs (M.A.) theses. This paper thus reports on the citation practice in 24 TESOL M.A.
theses written by Vietnamese students. Citation types were first searched on the Antconc software with the use of
the Regular Expressions (Regex) written for both conventional and âinventedâ citing ways by this group of
writers, and then based on Thompson and Tribbleâs (2001) framework, citation functions were investigated and
classified. Semi-structured interviews were also conducted with thesis writers and thesis supervisors. Besides
the general citation practice by this group of NNEWs, and the different citation functions and types in different
chapters of their theses, the study also found that these writers were not fully aware of the significance of
citations as a rhetorical device in their thesis writing, and insufficient attention was paid to the in-text citations
in the TESOL discourse community in Vietnam. These findings suggest explicit instructions on citations in order
to help novice writers to fully acquire the citation use
The Effect of Thesis Writing on Paraphrasing Ability of the EFL Alumni of the University of Mataram Lombok
Until recently, no study focused on analyzing the effect thesis writing program on paraphrasing ability of the alumni. Generally, some studies focused on the reversed direction, that is, the effect of paraphrasing ability on thesis writing. This is the novelty of the present study. The present study aimed at testing the effect of thesis writing program at the end of the EFL study on paraphrasing ability of the alumni an EFL education, identifying the types of paraphrasing, and exploring weaknesses in paraphrasing and causes of not paraphrasing. This evaluative ex-post facto research employed Mixed-methods. The participants were 68 alumni of the University of Mataram Indonesia, those who undertook thesis writing program during their study in EFL education and the others who did not write undergraduate thesis. They were selected purposively from 37 schools in West Nusa Tenggara province. Data were collected with writing tasks, questionnaire, interview, and recording. The data were analyzed quantitatively and qualitatively. It shows: 1) The level of the alumniâs paraphrasing ability is âmediumâ; 2) Thesis writing program affects paraphrasing ability of the EFL alumni; 3) Synonym and Change of Word Orders are the dominant techniques; 4) The teachersâ weaknesses involve lack of vocabulary, limited conversions, deviation from the authentic ideas, summarizing, and unclear paraphrasing, 5) The causes of not paraphrasing include limited knowledge of paraphrasing and grammatical understanding. It is suggested that teacher education institutions implement curriculums that support teachersâ writing skills. In turn, plagiarism could be minimized which leads to the production of teachersâ quality academic writing
MEGA: Multilingual Evaluation of Generative AI
Generative AI models have shown impressive performance on many Natural
Language Processing tasks such as language understanding, reasoning, and
language generation. An important question being asked by the AI community
today is about the capabilities and limits of these models, and it is clear
that evaluating generative AI is very challenging. Most studies on generative
LLMs have been restricted to English and it is unclear how capable these models
are at understanding and generating text in other languages. We present the
first comprehensive benchmarking of generative LLMs - MEGA, which evaluates
models on standard NLP benchmarks, covering 16 NLP datasets across 70
typologically diverse languages. We compare the performance of generative LLMs
including Chat-GPT and GPT-4 to State of the Art (SOTA) non-autoregressive
models on these tasks to determine how well generative models perform compared
to the previous generation of LLMs. We present a thorough analysis of the
performance of models across languages and tasks and discuss challenges in
improving the performance of generative LLMs on low-resource languages. We
create a framework for evaluating generative LLMs in the multilingual setting
and provide directions for future progress in the field.Comment: EMNLP 202
âAll These Nouns Together Just Donât Make Sense!â: An Investigation of EAP Studentsâ Challenges with Complex Noun Phrases in First-Year College-Level Textbooks
Complex noun phrases (CNP) are a major vehicle of academic written discourse (Halliday, 1988; 2004). However, in spite of the view that they pose significant challenges to English language learners, they are often overlooked in preparatory English for Academic Purposes (EAP) programs. This mixed methods study aims to investigate to what extent CNP present syntactic parsing challenges for upper-level college EAP students, and whether there is a perceived need for direct instruction in CNP in EAP programs. A special CNP proficiency test was administered to 70 upper-level Ontario college EAP students and a native speaker comparator group, and the results were compared with those obtained from interviews with seven of the test-takers. The results obtained from the statistical analyses and the interviews indicate that CNP are challenging to parse for upper-level EAP students and that direct instruction in CNP may be beneficial for improving their reading comprehension. Some teaching implications of the findings are also addressed.Les groupes nominaux complexes (GNC) sont un vecteur important du discours Ă©crit universitaire (Halliday, 1988; 2004). Cependant, bien quâon admette les difficultĂ©s quâils posent aux apprenant.e.s dâanglais, les GNC sont souvent peu pris en compte par les programmes prĂ©paratoires d'anglais sur objectifs universitaires (English for Academic Purposes ou EAP). Cette Ă©tude Ă mĂ©thodologie mixte vise Ă dĂ©terminer dans quelle mesure a) les GNC prĂ©sentent des dĂ©fis d'analyse syntaxique pour les Ă©tudiant.e.s de lâenseignement collĂ©gial postsecondaire inscrit.e.s Ă des cours EAP avancĂ©s, et b) un enseignement explicite des GNC est perçu comme nĂ©cessaire. Un test de compĂ©tence spĂ©cifique aux GNC a Ă©tĂ© administrĂ© Ă 70 Ă©tudiant.e.s de cours EAP avancĂ©s dâun collĂšge de l'Ontario et Ă un groupe comparatif composĂ© de locuteurs natifs; les rĂ©sultats au test ont Ă©tĂ© triangulĂ©s par le moyen dâentretiens avec sept participants. Les rĂ©sultats obtenus Ă partir des analyses statistiques des tests et des entretiens indiquent que les GNC sont difficiles Ă analyser pour les Ă©tudiant.e.s des cours EAP avancĂ©s, et que l'enseignement explicite des GNC pourrait permettre dâamĂ©liorer leur comprĂ©hension en lecture. Des pistes pĂ©dagogiques dĂ©coulant des rĂ©sultats sont Ă©galement abordĂ©es
NeCo@ALQAC 2023: Legal Domain Knowledge Acquisition for Low-Resource Languages through Data Enrichment
In recent years, natural language processing has gained significant
popularity in various sectors, including the legal domain. This paper presents
NeCo Team's solutions to the Vietnamese text processing tasks provided in the
Automated Legal Question Answering Competition 2023 (ALQAC 2023), focusing on
legal domain knowledge acquisition for low-resource languages through data
enrichment. Our methods for the legal document retrieval task employ a
combination of similarity ranking and deep learning models, while for the
second task, which requires extracting an answer from a relevant legal article
in response to a question, we propose a range of adaptive techniques to handle
different question types. Our approaches achieve outstanding results on both
tasks of the competition, demonstrating the potential benefits and
effectiveness of question answering systems in the legal field, particularly
for low-resource languages.Comment: ISAILD@KSE 202
What are Automated Paraphrasing Tools and how do we address them? A review of a growing threat to academic integrity
This article reviews the literature surrounding the growing use of Automated Paraphrasing Tools (APTs) as a threat to educational integrity. In academia there is a technological arms-race occurring between the development of tools and techniques which facilitate violations of the principles of educational integrity, including text-based plagiarism, and methods for identifying such behaviors. APTs are part of this race, as they are a rapidly developing technology which can help writers transform words, phrases, and entire sentences and paragraphs at the click of a button. This article seeks to review the literature surrounding the history of APT use and the current understanding of APTs placed in the broader context of the educational integrity-technology arms race
Recommended from our members
Effective and Efficient Transfer Learning in the Era of Large Language Models
Substantial progress has been made in the field of natural language processing (NLP) due to the advent of large language models (LLMs)âdeep neural networks with millions or billions of parameters pre-trained on large amounts of unlabeled data. However, these models have common weaknesses, including degenerate performance in data-scarce scenarios, and substantial computational resource requirements. This thesis aims to develop methods to address these limitations for improved applicability and performance of LLMs in resource-constrained settings with limited data and/or computational resources.
To address the need for labeled data in data-scarce scenarios, I present two methods, in Chapter 2 and Chapter 3, respectively. The first method leverages beneficial relationships between NLP tasks for transfer learning, while the second method combines data augmentation and self-training to boost few-shot learning performanceâthe ability to perform novel tasks from only a few labeled examples. Additionally, in Chapter 4, I introduce a novel parameter-efficient transfer learning approach that reuses a single frozen model for all tasks while only learning minimal task-specific parameters (soft/continuous prompts) to represent tasks and transfer knowledge. Our method can match or outperform fine-tuning task-specific models (training the whole model on each task). In Chapter 5, I demonstrate the benefits of parameter-efficient transfer learning in a cross-lingual transfer setting. Finally, I conclude the thesis in Chapter 6 by outlining potential avenues for future research that aim to advance NLP through large-scale multi-task learning using multilingual and multimodal data
On the Cross-lingual Transferability of Monolingual Representations
State-of-the-art unsupervised multilingual models (e.g., multilingual BERT)
have been shown to generalize in a zero-shot cross-lingual setting. This
generalization ability has been attributed to the use of a shared subword
vocabulary and joint training across multiple languages giving rise to deep
multilingual abstractions. We evaluate this hypothesis by designing an
alternative approach that transfers a monolingual model to new languages at the
lexical level. More concretely, we first train a transformer-based masked
language model on one language, and transfer it to a new language by learning a
new embedding matrix with the same masked language modeling objective, freezing
parameters of all other layers. This approach does not rely on a shared
vocabulary or joint training. However, we show that it is competitive with
multilingual BERT on standard cross-lingual classification benchmarks and on a
new Cross-lingual Question Answering Dataset (XQuAD). Our results contradict
common beliefs of the basis of the generalization ability of multilingual
models and suggest that deep monolingual models learn some abstractions that
generalize across languages. We also release XQuAD as a more comprehensive
cross-lingual benchmark, which comprises 240 paragraphs and 1190
question-answer pairs from SQuAD v1.1 translated into ten languages by
professional translators.Comment: ACL 202
- âŠ