46,472 research outputs found

    Transductive Learning with String Kernels for Cross-Domain Text Classification

    Full text link
    For many text classification tasks, there is a major problem posed by the lack of labeled data in a target domain. Although classifiers for a target domain can be trained on labeled text data from a related source domain, the accuracy of such classifiers is usually lower in the cross-domain setting. Recently, string kernels have obtained state-of-the-art results in various text classification tasks such as native language identification or automatic essay scoring. Moreover, classifiers based on string kernels have been found to be robust to the distribution gap between different domains. In this paper, we formally describe an algorithm composed of two simple yet effective transductive learning approaches to further improve the results of string kernels in cross-domain settings. By adapting string kernels to the test set without using the ground-truth test labels, we report significantly better accuracy rates in cross-domain English polarity classification.Comment: Accepted at ICONIP 2018. arXiv admin note: substantial text overlap with arXiv:1808.0840

    Supervised Identification of Writer\u27s Native Language Based on Their English Word Usage

    Get PDF
    In this paper, we investigate the possibility of constructing an automated tool for the writer\u27s first language detection based on a~document written in their second language. Since English is the contemporary lingua franca, commonly used by non-native speakers, we have chosen it to be the second language to study. In this paper, we examine English texts from computer science, a field related to mathematics. More generally, we wanted to study texts from a domain that operates with formal rules. We were able to achieve a high classification rate, about~90\%, using a relatively simple model (n-grams with logistic regression). We trained the model to distinguish twelve nationality groups/first languages based on our dataset. The classification mechanism was implemented using logistic regression with L1~regularisation, which performed well with sparse document-term data table. The experiment proved that we can use vocabulary alone to detect the first language with high accuracy

    Explorations in Sights and Sounds

    Get PDF

    Editorial

    Get PDF

    Second language reading of adolescent ELLs: a study of response to retrospective miscue analysis, error coding methodology and transfer of L1 decoding skills in L2 reading

    Full text link
    Thesis (Ed.D.)--Boston UniversityIt is well documented that ELLs face significant challenges as they develop literacy skills in their second language (NCES, 2007, 2011). This population is diverse and growing rapidly in Massachusetts and across the nation (Massachusetts Department of Elementary and Secondary Education, 2013; NCELA, 2011; Orosco, De Schonewise, De Onis, Klingner, & Hoover, 2008). Yet, this population is often left out of reading studies because of the range of variables they present (Klingner, 2010). This research focuses on the effects of a reading approach on adolescent ELLs, the power of coding systems to capture ELLs' reading errors and how exposure to a second writing system develops metalinguistic skills. In the first study of this dissertation, I examine the effects of an approach called Retrospective Miscue Analysis (RMA; Goodman & Marek, 1996) on six subjects in a school setting, using an n-of-one design to evaluate changes in their reading attributable to RMA. RMA has been researched with diverse learners in case studies; however, data had not been collected to demonstrate whether it could change subjects' fluency or reading comprehension in addition to their attitudes about reading and themselves as readers. My results suggest that students had positive feelings about RMA and believed that they had learned new ways to read, but the results do not point to immediate changes in their decoding accuracy, reading comprehension or fluency with RMA. This approach may have latent effects on overall reading performance by increasing motivation and self-confidence, but it did not appear to have immediate effects on my subjects' reading performance. The second study of this dissertation provides a methodological exploration of two coding systems. The first coding system, Reading Miscue Inventory (RMI; Goodman, Watson, & Burke, 2005) originated in miscue analysis research. The second coding system was developed by Cheng and Caldwell-Harris (to appear) to code oral reading errors Chinese readers made when reading Chinese, and it was also used by the researchers to code native English speakers' oral reading errors. Interview data from RMA was used as an additional lens for understanding the power of coding systems to reveal information about reading miscues, or oral reading errors. The results indicate that RMI needs revision for use with English language learners (ELLs), especially in the Meaning Construction category, but RMI also reminds us to consider miscues within the context of connected text. Cheng and Caldwell-Harris' system, on the other hand, appears to accurately illuminate general relationships between a target word and a reader's error but is limited to word-level analysis of oral reading errors. The third study of this dissertation examined patterns of oral reading errors according to ELLs' first language (L1) background to explore how L1 reading experiences affect the metalinguistic skills second language (L2) readers bring to reading in their L2. Statistical analysis of real word versus nonword oral reading errors subjects made revealed distinct patterns in L2 readers who had learned to read in Chinese versus Cyrillic writing systems. I argue that this difference in errors made by Chinese and Cyrillic readers supports Koda's (2009) Transfer Facilitation Model, which states that metalinguistic awareness reflects the systematic differences in writing systems readers become accustomed to. This difference in errors also appears to contradict predictions that transfer is less operable across unalike orthographies. I also explore Koda's (2009) hypothesis that experience reading a L2 should lead to changes in metalinguistic skills over time. My findings suggest that experienced L2 readers' decoding skills may not change, or may take significant time to change, with exposure to a second writing system

    Electrophysiological dynamics of Chinese phonology during visual word recognition in Chinese-English bilinguals

    Get PDF
    Silent word reading leads to the activation of orthographic (spelling), meaning, as well as phonological (sound) information. For bilinguals, native language information can also be activated automatically when they read words in their second language. For example, when Chinese-English bilinguals read words in their second language (English), the phonology of the Chinese translations is automatically activated. Chinese phonology, however, consists of consonants and vowels (segmental) and tonal information. To what extent these two aspects of Chinese phonology are activated is yet unclear. Here, we used behavioural measures, event-related potentials and oscillatory EEG to investigate Chinese segmental and tonal activation during word recognition. Evidence of Chinese segmental activation was found when bilinguals read English words (faster responses, reduced N400, gamma-band power reduction) and when they read Chinese words (increased LPC, gamma-band power reduction). In contrast, evidence for Chinese tonal activation was only found when bilinguals read Chinese words (gamma-band power increase). Together, our converging behavioural and electrophysiological evidence indicates that Chinese segmental information is activated during English word reading, whereas both segmental and tonal information are activated during Chinese word reading. Importantly, gamma-band oscillations are modulated differently by tonal and segmental activation, suggesting independent processing of Chinese tones and segments
    corecore