5,221 research outputs found

    Multilingual Education and Interference: Written Ungrammatical Tag-switching Among Pre-service Teachers of English Language

    Get PDF
    This research aims to: (1) show the ungrammaticality of pre-service teachers\u27 (Bachelor III students\u27) written tag-switching models and this is disapproving since these subjects are English teachers-to-be. (2) It also tracks tokens of interference of Kirundi, French, Kiswahili and English languages in the Bachelor III students\u27 written tag-switching examples as a result of the Burundian multilingual education system.The study refers to the observation and Testing as suggested respectively by Cohen et al. (2006) and Hughes (2003). The researcher\u27s unstructured observation participated in his review of observational data before suggesting any explanation for the phenomena being observed. The test given helped measure on the one hand those pre-service teachers\u27 achievements of the course objectives and diagnose their strengths and weaknesses on the other hand. The subjects of the study consisted of thirty-six (36) students whose preference was tag-switching in an Exam of Sociolinguistics with the question framed as follows: “Among the different code switching types, choose one and exemplify it with three examples.” The Kuder- Richardson formula 20 (KR-20) and Standard Error Measure (SEM), provided helpful information when having to take decisions about individuals on the basis of their performance in a test such as the one given during this research, (Hughes, ibid:224). The research findings reveal a mismatching between the subjects\u27 level of study and the written tag-switching examples that they gave: after correction done diligently and skilfully, ungrammaticality is a case and it includes the subjects\u27 wrong tense use at the tag level and the occurrence of wrong choice of tenses, aspects and mood (either in Kirundi, French and Kiswahili) in the part before tag level. The cause of these erroneous tag- switching examples is revealed to take source in the multilingual education system operational in Burundi. Keywords: Educational multilingualism, interference, tag-switching and ungrammaticalit

    Data Augmentation Techniques for Machine Translation of Code-Switched Texts: A Comparative Study

    Full text link
    Code-switching (CSW) text generation has been receiving increasing attention as a solution to address data scarcity. In light of this growing interest, we need more comprehensive studies comparing different augmentation approaches. In this work, we compare three popular approaches: lexical replacements, linguistic theories, and back-translation (BT), in the context of Egyptian Arabic-English CSW. We assess the effectiveness of the approaches on machine translation and the quality of augmentations through human evaluation. We show that BT and CSW predictive-based lexical replacement, being trained on CSW parallel data, perform best on both tasks. Linguistic theories and random lexical replacement prove to be effective in the lack of CSW parallel data, where both approaches achieve similar results.Comment: Findings of EMNLP 202

    From Frequency to Meaning: Vector Space Models of Semantics

    Full text link
    Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

    The use of L1 as a writing strategy in L2 writing tasks

    Get PDF
    Numerous studies have investigated how students integrate L1 for the function of acquiring L2 writing proficiency. However, there is still no consensus that relates the degree of L1 use and various writing strategies in L2 writing to student proficiency levels and writing genres or writing tasks. The present study explored these issues over the course of 14 weeks with nine Korean university students of three different proficiency levels performing six writing tasks in two genres. The data were collected from the students‟ think-aloud protocols and retrospective interviews. The think-aloud and interview data were analyzed to examine the students‟ use of L1 during the L2 writing. The think-aloud protocols were also coded into their functions for what purposes each language type was used. The results showed that lower level students used their L1 more than the advanced students, but all students used L1 to different degrees depending on each task. In other words, the students reacted differently in accordance with task familiarity and the relative ease or difficulty of the task. The study also found that there was no consistent relationship between language proficiency and the types of writing strategies the students used in L2 composition. On the other hand, this study showed that although the types of writing strategies the students employed were similar, the students of various proficiency levels applied L1 strategies to their writing in different ways. The findings showed that L1 use in L2 writing can play an encouraging role for both the ideational and compensatory purposes, suggesting that the strategic use of L1 can contribute to improvement in L2 composition. The paper discusses that writing instruction should focus more on the topics of how to use writing strategies as well as what writing strategies to use

    Weakly supervised deep learning for the detection of domain generation algorithms

    Get PDF
    Domain generation algorithms (DGAs) have become commonplace in malware that seeks to establish command and control communication between an infected machine and the botmaster. DGAs dynamically and consistently generate large volumes of malicious domain names, only a few of which are registered by the botmaster, within a short time window around their generation time, and subsequently resolved when the malware on the infected machine tries to access them. Deep neural networks that can classify domain names as benign or malicious are of great interest in the real-time defense against DGAs. In contrast with traditional machine learning models, deep networks do not rely on human engineered features. Instead, they can learn features automatically from data, provided that they are supplied with sufficiently large amounts of suitable training data. Obtaining cleanly labeled ground truth data is difficult and time consuming. Heuristically labeled data could potentially provide a source of training data for weakly supervised training of DGA detectors. We propose a set of heuristics for automatically labeling domain names monitored in real traffic, and then train and evaluate classifiers with the proposed heuristically labeled dataset. We show through experiments on a dataset with 50 million domain names that such heuristically labeled data is very useful in practice to improve the predictive accuracy of deep learning-based DGA classifiers, and that these deep neural networks significantly outperform a random forest classifier with human engineered features

    Code-Switching with Word Senses for Pretraining in Neural Machine Translation

    Full text link
    Lexical ambiguity is a significant and pervasive challenge in Neural Machine Translation (NMT), with many state-of-the-art (SOTA) NMT systems struggling to handle polysemous words (Campolungo et al., 2022). The same holds for the NMT pretraining paradigm of denoising synthetic "code-switched" text (Pan et al., 2021; Iyer et al., 2023), where word senses are ignored in the noising stage -- leading to harmful sense biases in the pretraining data that are subsequently inherited by the resulting models. In this work, we introduce Word Sense Pretraining for Neural Machine Translation (WSP-NMT) - an end-to-end approach for pretraining multilingual NMT models leveraging word sense-specific information from Knowledge Bases. Our experiments show significant improvements in overall translation quality. Then, we show the robustness of our approach to scale to various challenging data and resource-scarce scenarios and, finally, report fine-grained accuracy improvements on the DiBiMT disambiguation benchmark. Our studies yield interesting and novel insights into the merits and challenges of integrating word sense information and structured knowledge in multilingual pretraining for NMT.Comment: EMNLP (Findings) 2023 Long Pape

    Language Switching On English Compositions Of Latino Students In Alaska And Puerto Rico

    Get PDF
    Thesis (Ph.D.) University of Alaska Fairbanks, 2007The main objective of the research described in this dissertation was to explore how English second language (ESL) writers used their first language (L1) when composing in their second language (L2). This task was undertaken by identifying participants according to their L2 (English) proficiency level, Latino ethnic subgroup, and generational status. Another objective of this study was to better understand the writer's perspective regarding first language use in L2 writing, referred to as language-switching (L-S) in this study. Eight high school Latinos were recruited in Fairbanks, Alaska, and a group of twenty-three college-level participants in Mayaguez, Puerto Rico. Participants were asked to complete a self-report questionnaire, provide a writing sample, and participate in a guided focus group discussion. Findings indicated that participants with low L2 proficiency were more likely to switch languages at the lexical level than participants at an intermediate or advanced level of English proficiency. Switching languages from English to Spanish at the lexical level was of no benefit for text coherence. Lack of L2 linguistic competence was a contributing factor for switching to the L1 as participants compensated for L2 difficulties with their L1 knowledge at the morphological, syntactical, and semantic level. A qualitative analysis of the focus group data suggests that thinking in the L1 is a common strategy for ESL learners, which they perceive to be an advantage for generating ideas faster and to decide what to write. However, participants' perceived writing text in the L1 for later content translation to be counterproductive. An important factor that cannot be discounted and that may have contributed to the language switching frequency among the participants in this study is the learning contexts: learning English in the U.S. versus learning English in Puerto Rico. Additional research is needed to explore the relationship between language switching and learning context. I conclude this dissertation by suggesting pedagogical implications regarding L2 writing instruction and for placement of L2 learners in ESL programs
    • …
    corecore