Search CORE

40,477 research outputs found

A Corpus-driven Approach toward Teaching Vocabulary and Reading to English Language Learners in U.S.-based K-12 Context through a Mobile App

Author: Ehsanzadehsorati Seyedjafar
Publication venue: FIU Digital Commons
Publication date: 01/01/2018
Field of study

In order to decrease teachers’ decisions of which vocabulary the focus of the instruction should be upon, a recent line of research argues that pedagogically-prepared word lists may offer the most efficient order of learning vocabulary with an optimized context for instruction in each of four K-12 content areas (math, science, social studies, and language arts) through providing English Language Learners (ELLs) with the most frequent words in each area. Educators and school experts have acknowledged the need for developing new materials, including computerized enhanced texts and effective strategies aimed at improving ELLs’ mastery of academic and STEM-related lexicon. Not all words in a language are equal in their role in comprehending the language and expressing ideas or thoughts. For this study, I used a corpus-driven approach which is operationalized by applying a text analysis method. For the purpose of this research study, I made two corpora, Teacher’s U.S. Corpus (TUSC) and Science and Math Academic Corpus for Kids (SMACK) with a focus on word lemma rather than inflectional and derivational variants of word families. To create the corpora, I collected and analyzed a total of 122 textbooks used commonly in the states of Florida and California. Recruiting, scanning and converting of textbooks had been carried out over a period of more than two years from October 2014 to March 2017. In total, this school corpus contains 10,519,639 running words and 16,344 lemmas saved in 16,315 word document pages. From the corpora, I developed six word lists, namely three frequency-based word lists (high-, mid-, and low-frequency), academic and STEM-related word lists, and essential word list (EWL). I then applied the word lists as the database and developed a mobile app, Vocabulary in Reading Study – VIRS, (available on App Store, Android and Google Play) alongside a website (www.myvirs.com). Also, I developed a new K-12 dictionary which targets the vocabulary needs of ELLs in K-12 context. This is a frequency-based dictionary which categorizes words into three groups of high, medium and low frequency words as well as two separate sections for academic and STEM words. The dictionary has 16,500 lemmas with derivational and inflectional forms

DigitalCommons@Florida International University

Analysis of the Correlation Between the Lexical Profile and Coh-Metrix 3.0 Text Easability and Readability Indices of the Korean CSAT From 1994–2022

Author: Howie Andrew
Publication venue: ScholarWorks@CWU
Publication date: 01/01/2022
Field of study

The Korean College Scholastic Ability Test (CSAT) is a highly competitive standardized assessment that graduating high-school seniors complete in the hope of getting a good score which will improve their chances of admission to a university of choice. The CSAT contains an English Section that has been described by scholars and educators alike as being far too difficult for the official English language curriculum to serve as sufficient preparation. The test’s lack of construct validity has been the basis for calls to revise the test to be better reflective of the school curriculum so that it can serve the evaluative purpose for which it is intended. Use of automated text evaluation methods with the software Coh-Metrix 3.0 in recent years has allowed scholars to quantify different dimensions of the text of the CSAT English Section, such as cohesion and syntactic complexity, that contribute to its reading difficulty. Older research conducted before the introduction of this software into the field used word frequency counts in large corpora such as the British National Corpus (BNC) as a measure of word familiarity or unfamiliarity, which was thought to directly contribute to difficulty because as the proportion of low-frequency words in a text increases against the proportion of high-frequency words, the word knowledge burden of the text increases in proportion. Since the introduction of automated software-based tools like Coh-Metrix 3.0 and Lexical Complexity Analyzer (LCA), these corpus-based research methods have largely fallen by the wayside. In this paper, I maintain that despite its lower sophistication, corpus-based lexical analysis can still produce uniquely meaningful findings because of the degree of manual control the researcher is afforded in calibrating the parameters of the text base and, most importantly, in selecting the ranges of word family frequency that are best tailored to a text rather than having the ranges or functions of frequency assigned automatically by software. This study reports correlations between the outputs of these two methodologies that both inform us about the validity of Coh-Metrix 3.0’s use in CSAT studies and quantify the strength of the role of word frequency in causing the excessive difficulty of the CSAT English Section

ScholarWorks at Central Washington University

The Rank-Frequency Analysis for the Functional Style Corpora in the Ukrainian Language

Author: Buk Solomija N.
Rovenchak Andrij A.
Publication venue: 'Informa UK Limited'
Publication date: 21/11/2003
Field of study

We use the rank-frequency analysis for the estimation of Kernel Vocabulary size within specific corpora of Ukrainian. The extrapolation of high-rank behaviour is utilized for estimation of the total vocabulary size.Comment: 8 page

arXiv.org e-Print Archive

Crossref

Animacy in early New Zealand english

Author: Hundt Marianne
Szmrecsanyi Benedikt
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2012
Field of study

The literature suggests that animacy effects in present-day spoken New Zealand English (NZE) differ from animacy effects in other varieties of English. We seek to determine if such differences have a history in earlier NZE writing or not. We revisit two grammatical phenomena — progressives and genitives — that are well known to be sensitive to animacy effects, and we study these phenomena in corpora sampling 19th- and early 20th-century written NZE; for reference purposes, we also study parallel samples of 19th- and early 20th-century British English and American English. We indeed find significant regional differences between early New Zealand writing and the other varieties in terms of the effect that animacy has on the frequency and probabilities of grammatical phenomena

Lirias

Crossref

ZORA

The University of Manchester - Institutional Repository

A Pattern Matching method for finding Noun and Proper Noun Translations from Noisy Parallel Corpora

Author: Fung Pascale
Publication venue
Publication date: 01/01/1995
Field of study

We present a pattern matching method for compiling a bilingual lexicon of nouns and proper nouns from unaligned, noisy parallel texts of Asian/Indo-European language pairs. Tagging information of one language is used. Word frequency and position information for high and low frequency words are represented in two different vector forms for pattern matching. New anchor point finding and noise elimination techniques are introduced. We obtained a 73.1\% precision. We also show how the results can be used in the compilation of domain-specific noun phrases.Comment: 8 pages, uuencoded compressed postscript file. To appear in the Proceedings of the 33rd AC

arXiv.org e-Print Archive

CiteSeerX

Crossref

Columbia University Academic Commons