7 research outputs found

    Constructing a Collocation Learning System from the Wikipedia Corpus

    Get PDF
    The importance of collocations for success in language learning is widely recognized. Concordancers, originally designed for linguists, are among the most popular tools for students to obtain, organize, and study collocations derived from corpora. This paper describes the design and development of a collocation learning system that is built from Wikipedia text and provides language learners with an easy-to-use interface for looking up collocations of any word that occurs in Wikipedia. The use of this corpus exposes learners to contemporary, content-related text, and enables them to search for semantically related words for a given topic. The system organizes collocations by syntactic pattern, sorts them by frequency, and links them to their original context. The paper includes a practical user guide to illustrate how to use the system as a language aid to facilitate academic writing

    F-Lingo: Integrating lexical feature identification into MOOC platforms for learning professional and academic English

    Get PDF
    F-Lingo is a chrome extension that works on top of the FutureLearn MOOC platform to support content-based language learning of domain-specific terminology for professional and academic purposes

    Acehnese lexical and grammatical collocations of the North Aceh dialect

    Get PDF
    This research dealt with collocations used in the North Aceh dialect. It analyzed the part of speech categories into which collocations of the North Aceh dialect can be grouped. This research focused on the grammatical collocations and lexical collocations used in the Blang Mee village of Bireuen District, Aceh, Indonesia. This is a descriptive qualitative using case study looking into the villagers’ use of Acehnese collocations. Six Acehnese speakers were selected as the language informants. They were fifty years old or above and never traveled or lived outside of Blang Mee. Data were extracted from interviews with these speakers who were asked to discuss general topics in Acehnese. The analysis was performed using a framework on collocation categories by Benson et al. The results of the analysis showed both lexical and grammatical collocations used by speakers in North Aceh. Lexical collocations were found in the forms of verb (denoting creation or activation) + noun combinations, verb (meaning eradication or nullification) + a noun, adjective + noun combinations, noun + verb combinations, noun + noun combinations, adverb + adjective combinations, and verb + adverb combinations. Grammatical collocations came in the following combinations: noun + preposition combinations, adjective + preposition combinations, preposition + noun combinations, and collocational verb patterns. The collocations used by the North Aceh dialect speakers indicate the uniqueness of their dialect within other dialects spoken by the Acehnese

    Automatically augmenting academic text for language learning: PhD abstract corpora with the British Library

    Get PDF
    This chapter describes the automated FLAX language system (flax.nzdl.org) that extracts salient linguistic features from academic text and presents them in an interface designed for L2 students who are learning academic writing. Typical lexico-grammatical features of any word or phrase, collocations and lexical bundles are automatically identified and extracted in a corpus; learners can explore them by searching and browsing, and inspect them along with contextual information. This chapter uses a single running example, the PhD abstracts corpus of 9.8 million words, derived from the open access Electronic Theses Online Service (EThOS) at the British Library, but the approach is fully automated and can be applied to any collection of English writing. Implications for reusing open access publications for non-commercial educational and research purposes are presented for discussion. Design considerations for developing teaching and learning applications that focus on the rhetorical and lexico-grammatical patterns found in the abstract genre are also discussed

    A new paradigm for open data-driven language learning systems design in higher education

    Get PDF
    This doctoral thesis presents three studies in collaboration with the open source FLAX project (Flexible Language Acquisition flax.nzdl.org). This research makes an original contribution to the fields of language education and educational technology by mobilising knowledge from computer science, corpus linguistics and open education, and proposes a new paradigm for open data-driven language learning systems design in higher education. Furthermore, the research presented in this thesis uncovers and engages with an infrastructure of open educational practices (OEP) that push at the parameters of policy for the reuse of open access research and pedagogic content in the design, development, distribution, adoption and evaluation of data-driven language learning systems. Study 1 employs automated content analysis to mine the concept of open educational systems and practices from qualitative reflections spanning 2012-2019 with stakeholders from an on-going multi-site design-based research study with the FLAX project. Design considerations are presented for remixing domain-specific open access content for academic English language provision across formal and non-formal higher education contexts. Primary stakeholders in this ongoing research collaboration include the following: knowledge organisations – libraries and archives including the British Library and the Oxford Text Archive, universities in collaboration with Massive Open Online Course (MOOC) providers; an interdisciplinary team of researchers; and knowledge users in formal higher education – English for Academic Purposes (EAP) practitioners. Themes arising from the qualitative dataset point to affordances as well as barriers with the adoption of open policies and practices for remixing open access content for data-driven language learning applications in higher education against the backdrop of different business models and cultural practices present within participating knowledge organisations. Study 2 presents a data-driven experiment in non-formal higher education by triangulating user query system log data with learner participant data from surveys (N=174) on the interface designs and usability of an automated open source digital library scheme, FLAX. Text and data mining approaches (TDM) common to natural language processing (NLP) were applied to pedagogical English language corpora, derived from the content of two MOOCs, (Harvard University with edX, and the University of London with Coursera), and one networked course (Harvard Law School with the Berkman Klein Center for Internet and Society), which were then linked to external open resources (e.g. Wikipedia, the FLAX Learning Collocations system, WordNet), so that learners could employ the information discovery techniques (e.g. searching and browsing) that they have become accustomed to using through search engines (e.g. Google, Bing) for discovering and learning the domain-specific language features of their interests. Findings indicate a positive user experience with interfaces that include advanced affordances for course content browse, search and retrieval that transcend the MOOC platform and Learning Management System (LMS) standard. Further survey questions derived from an open education research bank from the Hewlett Foundation are reused in this study and presented against a larger dataset from the Hewlett Foundation (N=1921) on motivations for the uptake of open educational resources. Study 3 presents a data-driven experiment in formal higher education from the legal English field to measure quantitatively the usefulness and effectiveness of employing the open Law Collections in FLAX in the teaching of legal English at the University of Murcia in Spain. Informants were divided into an experimental and a control group and were asked to write an essay on a given set of legal English topics, defined by the subject instructor as part of their final assessment. The experimental group only consulted the FLAX English Common Law MOOC collection as the single source of information to draft their essays, and the control group used any information source available from the Internet to draft their essays. Findings from an analysis of the two learner corpora of essays indicate that members of the experimental group appear to have acquired the specialised terminology of the area better than those in the control group, as attested by the higher term average obtained by the texts in the FLAX-based corpus (56.5) as opposed to the non-FLAX-based text collection, at 13.73 points below
    corecore