113 research outputs found

    Autenttisiin teksteihin perustuva tietokoneavusteinen kielen oppiminen: sovelluksia italian kielelle

    Get PDF
    Computer-Assisted Language Learning (CALL) is one of the sub-disciplines within the area of Second Language Acquisition. Clozes, also called fill-in-the-blank, are largely used exercises in language learning applications. A cloze is an exercise where the learner is asked to provide a fragment that has been removed from the text. For language learning purposes, in addition to open-end clozes where one or more words are removed and the student must fill the gap, another type of cloze is commonly used, namely multiple-choice cloze. In a multiple-choice cloze, a fragment is removed from the text and the student must choose the correct answer from multiple options. Multiple-choice exercises are a common way of practicing and testing grammatical knowledge. The aim of this work is to identify relevant learning constructs for Italian to be applied to automatic exercises creation based on authentic texts in the Revita Framework. Learning constructs are units that represent language knowledge. Revita is a free to use online platform that was designed to provide language learning tools with the aim of revitalizing endangered languages including several Finno-Ugric languages such as North Saami. Later non-endangered languages were added. Italian is the first majority language to be added in a principled way. This work paves the way towards adding new languages in the future. Its purpose is threefold: it contributes to the raising of Italian from its beta status towards a full development stage; it formulates best practices for defining support for a new language and it serves as a documentation of what has been done, how and what remains to be done. Grammars and linguistic resources were consulted to compile an inventory of learning constructs for Italian. Analytic and pronominal verbs, verb government with prepositions, and noun phrase agreement were implemented by designing pattern rules that match sequences of tokens with specific parts-of-speech, surfaces and morphological tags. The rules were tested with test sentences that allowed further refining and correction of the rules. Current precision of the 47 rules for analytic and pronominal verbs on 177 test sentences results in 100%. Recall is 96.4%. Both precision and recall for the 5 noun phrase agreement rules result in 96.0% in respect to the 34 test sentences. Analytic and pronominal verb, as well as noun phrase agreement patterns, were used to generate open-end clozes. Verb government pattern rules were implemented into multiple-choice exercises where one of the four presented options is the correct preposition and the other three are prepositions that do not fit in context. The patterns were designed based on colligations, combinations of tokens (collocations) that are also explained by grammatical constraints. Verb government exercises were generated on a specifically collected corpus of 29074 words. The corpus included three types of text: biography sections from Wikipedia, Italian news articles and Italian language matriculation exams. The last text type generated the most exercises with a rate of 19 exercises every 10000 words, suggesting that the semi-authentic text met best the level of verb government exercises because of appropriate vocabulary frequency and sentence structure complexity. Four native language experts, either teachers of Italian as L2 or linguists, evaluated usability of the generated multiple-choice clozes, which resulted in 93.55%. This result suggests that minor adjustments i.e., the exclusion of target verbs that cause multiple-admissibility, are sufficient to consider verb government patterns usable until the possibility of dealing with multiple-admissible answers is addressed. The implementation of some of the most important learning constructs for Italian resulted feasible with current NLP tools, although quantitative evaluation of precision and recall of the designed rules is needed to evaluate the generation of exercises on authentic text. This work paves the way towards a full development stage of Italian in Revita and enables further pilot studies with actual learners, which will allow to measure learning outcomes in quantitative term

    The Transferability of Reading Strategies between L1 (Arabic) and L2 (English)

    Get PDF
    Relationships between learnersā€™ languages were usually studied in the form of tracing transfer of linguistic items from one language to the other. This study explored the Libyans university studentsā€™ transferability of reading strategies from the first language (Arabic) to the foreign language (English) and vice versa. In a foreign language environment, textbooks are usually the only medium for practising reading in that language. Reading textbooks prescribed in Basic Education and Secondary Education were explored to highlight the reading strategies the students practised and to answer the following research questions: (1) What reading strategies are presented in first language (L1) reading textbooks and in English as foreign language (L2) reading textbooks? Are there any differences in the reading strategies introduced in L1 reading textbooks and L2 reading textbooks? Results of comparing the strategies addressed in L1 and L2 reading textbooks indicated that some of the strategies were presented in one languageā€™s reading textbooks rather than in those of the other language. Based on textbook analyses, two Cloze tests (one in Arabic and the other in English) were developed and administered to first year university students in three colleges in North West Libya. These tests were used to define good and poor readers and used as a basis for providing a reading environment in which they might use their reading strategies. In each college, and after defining good and poor achievers in the Cloze test, two subjects from each group were interviewed. These interviewees were selected through stratified sampling and random sampling, respectively. The first group of interview questions investigated the reading strategies used during the Cloze tests while the second group sought to examine the reading strategies mentioned by the subjects in suggested reading situations based on the data collected from the textbooks. This procedure was carried out to answer the following research questions: What reading strategies does a representative sample of first year university students use in L1 reading and in L2 reading? Do the participants transfer any reading strategies (presented in the textbooks) from L1 to L2 or vice versa? If yes, what L1 reading strategies do good and poor readers transfer to L2 reading comprehension? And what L2 reading strategies do good and poor readers transfer to L1 reading comprehension? Results from the interviews indicated that good and poor readers alike transferred certain reading strategies between the two languages (Arabic and English). These strategies were mainly local, i.e. relevant to single words and sentences. However, some strategies were transferred only by good readers. These strategies were holistic i.e. they required awareness and account of the discourse. These results indicate that transferability is affected not only by readersā€™ ability but also by the kind of strategy he or she uses, i.e. whether it is local or universal. Moreover, it can be concluded that textbooks are not the only source of learning reading strategies. This study suggests there may be a far wider potential than within one country where more than one language are learned for reviewing reading strategies, implicit or intentional, in L2 textbooks and the extent to which learners are able to respond to them

    Unsupervised relation extraction for e-learning applications

    Get PDF
    In this modern era many educational institutes and business organisations are adopting the e-Learning approach as it provides an effective method for educating and testing their students and staff. The continuous development in the area of information technology and increasing use of the internet has resulted in a huge global market and rapid growth for e-Learning. Multiple Choice Tests (MCTs) are a popular form of assessment and are quite frequently used by many e-Learning applications as they are well adapted to assessing factual, conceptual and procedural information. In this thesis, we present an alternative to the lengthy and time-consuming activity of developing MCTs by proposing a Natural Language Processing (NLP) based approach that relies on semantic relations extracted using Information Extraction to automatically generate MCTs. Information Extraction (IE) is an NLP field used to recognise the most important entities present in a text, and the relations between those concepts, regardless of their surface realisations. In IE, text is processed at a semantic level that allows the partial representation of the meaning of a sentence to be produced. IE has two major subtasks: Named Entity Recognition (NER) and Relation Extraction (RE). In this work, we present two unsupervised RE approaches (surface-based and dependency-based). The aim of both approaches is to identify the most important semantic relations in a document without assigning explicit labels to them in order to ensure broad coverage, unrestricted to predefined types of relations. In the surface-based approach, we examined different surface pattern types, each implementing different assumptions about the linguistic expression of semantic relations between named entities while in the dependency-based approach we explored how dependency relations based on dependency trees can be helpful in extracting relations between named entities. Our findings indicate that the presented approaches are capable of achieving high precision rates. Our experiments make use of traditional, manually compiled corpora along with similar corpora automatically collected from the Web. We found that an automatically collected web corpus is still unable to ensure the same level of topic relevance as attained in manually compiled traditional corpora. Comparison between the surface-based and the dependency-based approaches revealed that the dependency-based approach performs better. Our research enabled us to automatically generate questions regarding the important concepts present in a domain by relying on unsupervised relation extraction approaches as extracted semantic relations allow us to identify key information in a sentence. The extracted patterns (semantic relations) are then automatically transformed into questions. In the surface-based approach, questions are automatically generated from sentences matched by the extracted surface-based semantic pattern which relies on a certain set of rules. Conversely, in the dependency-based approach questions are automatically generated by traversing the dependency tree of extracted sentence matched by the dependency-based semantic patterns. The MCQ systems produced from these surface-based and dependency-based semantic patterns were extrinsically evaluated by two domain experts in terms of questions and distractors readability, usefulness of semantic relations, relevance, acceptability of questions and distractors and overall MCQ usability. The evaluation results revealed that the MCQ system based on dependency-based semantic relations performed better than the surface-based one. A major outcome of this work is an integrated system for MCQ generation that has been evaluated by potential end users.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Prediction during native and non-native language comprehension: the role of mediating factors

    Get PDF
    Psycholinguistic evidence suggests that people predict upcoming words during language comprehension. While many studies have addressed what information people predict, less is known about the role of factors that potentially mediate predictive processing. This thesis examines predictions of semantic information and word form information. It investigates whether predictive processing is mediated by availability of cognitive resources and time to generate predictions, and compares predictive processing in native (L1) speakers and non-native (L2) speakers. This thesis presents two major lines of work. Two eye-tracking studies investigate prediction of semantic and word form information using a visual world paradigm. In further two ERP studies, we address the interplay of semantic and word form information in a paradigm which combines both possibilities. Experiments 1 and 2 were an eye-tracking study conducted on L1 and L2 speakers of English. The study has demonstrated that L1 and L2 speakers predict semantic information, but their predictive eye movements are delayed when they are under a cognitive load. The effects of cognitive load on predictive eye movements suggest a role of cognitive resources in language prediction in both L1 and L2 speakers. Experiments 3 and 4 were another eye-tracking study conducted on L1 and L2 speakers. The study has shown that L1 speakers predict word form information, but L2 speakers do not. Experiments 5 and 6 were an ERP study, which investigated the interplay of prediction of semantic and word form information in L1 English speakers. Consistent with the two sets of eye-tracking experiments, L1 speakers predicted both semantic and word form information, but word form was only predicted when sentences were presented at a slower rate, while semantic information was predicted at standard and slow presentation rates. Experiments 7 and 8 used the same method as Experiments 5 and 6, conducted on L2 English speakers. L2 speakers comprehended sentences incrementally, but there was no clear evidence that they predicted semantic information or word form information. Experiments 5 ā€“ 8 suggest that prediction of word form information is mediated both by nativeness of the target language and by reading rates. To conclude, both L1 and L2 speakers make predictions, but prediction of semantic information occurs only when there are enough cognitive resources available. Prediction of word form can occur in L1 speakers, but it occurs only when there is enough time available. There is no evidence that L2 speakers predict word form, suggesting a role of nativeness of the target language. The findings are consistent with the production-based prediction model of language prediction, in that prediction of word form is less likely to occur compared to prediction of semantic information. Furthermore, the findings are also consistent with the claim that not everyone makes predictions, and predictions do not always occur. The thesis concludes that prediction is additional processing for the comprehension system, and is not always implicated in the comprehension system

    Learner Modelling for Individualised Reading in a Second Language

    Get PDF
    Extensive reading is an effective language learning technique that involves fast reading of large quantities of easy and interesting second language (L2) text. However, graded readers used by beginner learners are expensive and often dull. The alternative is text written for native speakers (authentic text), which is generally too difficult for beginners. The aim of this research is to overcome this problem by developing a computer-assisted approach that enables learners of all abilities to perform effective extensive reading using freely-available text on the web. This thesis describes the research, development and evaluation of a complex software system called FERN that combines learner modelling and iCALL with narrow reading of electronic text. The system incorporates four key components: (1) automatic glossing of difficult words in texts, (2) individualised search engine for locating interesting texts of appropriate difficulty, (3) supplementary exercises for introducing key vocabulary and reviewing difficult words and (4) reliably monitoring reading and reporting progress. FERN was optimised for English speakers learning Spanish, but is easily adapted for learners of others languages. The suitability of the FERN system was evaluated through corpus analysis, machine translation analysis and a year-long study with second year university Spanish class. The machine translation analysis combined with the classroom study demonstrated that the word and phrase error rate generated in FERN is low enough to validate the use of machine translation to automatically generate glosses, but is high enough that a translation dictionary is required as a backup. The classroom study demonstrated that when aided by glosses students can read at over 100 words per minute if they know 95% of the words, whereas compared to the 98% word knowledge required for effective unaided extensive reading. A corpus analysis demonstrated that beginner learners of Spanish can do effective narrow reading of news articles using FERN after learning only 200ā€“300 high-frequency word families, in addition to familiarity with English-Spanish cognates and proper nouns. FERN also reliably monitors reading speeds and word counts, and provides motivating progress reports, which enable teachers to set concrete reading goals that dramatically increase the quantity that students read, as demonstrated in the user study

    Pushed-output instruction for vocabulary learning: Exploring differences in learning gains and lexical profiling

    Get PDF
    Previous research has shown that vocabulary can be learned through pushed-output activities. However, the few previous studies on the topic have mainly focused on the acquisition of nouns. Little is known about the acquisition of other parts of speech or about other components of lexical mastery achieved through pushed-output activities. This thesis examines the effectiveness of spoken pushed-output instruction on learning the multiple meaning senses of single-word verbs and phrasal verbs by presenting two classroom intervention studies. Study 1 explored differences between the effectiveness of spoken pushed-output and traditional vocabulary-focused instructions for learning polysemous single-word verbs and phrasal verbs. A between-subjects design was used, which included three conditions: no instruction, traditional vocabulary instruction and spoken pushed-output instruction. Both receptive and productive knowledge were investigated. The data were analysed using two approaches: (1) examining the receptive and productive vocabulary gains after instruction and (2) looking beyond the vocabulary gains by examining the lexical profile of the spoken production after instruction (i.e., overall text length, mean length of utterances, lexical diversity, lexical density and lexical sophistication). The findings indicated that with spoken pushed-output instruction, learners significantly improved not only in learning the multiple meaning senses of the target items but also in producing these meaning senses more fluently in longer, more lexically diverse, lexically denser and lexically sophisticated stretches of language. The results also indicated that single-word verbs could be learned at a similar rate to that of phrasal verbs. The results also showed that, except for the receptive gains of the first meaning sense, which had an advantage over the other meaning senses, no other differences among the three meaning senses emerged. This study demonstrated the advantage of spoken pushed-output instruction, justifying its use in the classroom. However, there are many different types of spoken pushed-output activities that may be implemented, making it logical to ask which are the most effective. Study 2 explored the effects of three different spoken pushed-output activities on learning polysemous single-word verbs and phrasal verbs: sentence reconstruction, listen-and-retell meaning, and picture description. The results indicated that all three activities resulted in similar recall scores but differed in their effectiveness for meaning recognition. The sentence reconstruction activity was found to be the most effective activity at the recognition level (as shown by the scores of the receptive test). The results also indicated that under similar instruction conditions, phrasal verbs are likely to be learned receptively and productively at a similar rate to single-word verbs. The results also showed that the first meaning sense was more easily recognised; however, no differences emerged neither in the recall scores nor in the mean length of utterances scores. Overall, the findings presented in the thesis support the use of spoken pushed-output instruction in the classroom for teaching single words and formulaic sequences. Further, the findings support the idea that, if the type and amount of instruction are controlled to be the same for single-word verbs and phrasal verbs, the learnability of these two types of items may be the same. While the findings cannot be easily generalised to other types of formulaic sequences, they do encourage further research on the teaching of formulaic sequences
    • ā€¦
    corecore