265 research outputs found

    RSpell: Retrieval-augmented Framework for Domain Adaptive Chinese Spelling Check

    Full text link
    Chinese Spelling Check (CSC) refers to the detection and correction of spelling errors in Chinese texts. In practical application scenarios, it is important to make CSC models have the ability to correct errors across different domains. In this paper, we propose a retrieval-augmented spelling check framework called RSpell, which searches corresponding domain terms and incorporates them into CSC models. Specifically, we employ pinyin fuzzy matching to search for terms, which are combined with the input and fed into the CSC model. Then, we introduce an adaptive process control mechanism to dynamically adjust the impact of external knowledge on the model. Additionally, we develop an iterative strategy for the RSpell framework to enhance reasoning capabilities. We conducted experiments on CSC datasets in three domains: law, medicine, and official document writing. The results demonstrate that RSpell achieves state-of-the-art performance in both zero-shot and fine-tuning scenarios, demonstrating the effectiveness of the retrieval-augmented CSC framework. Our code is available at https://github.com/47777777/Rspell

    A Frustratingly Easy Plug-and-Play Detection-and-Reasoning Module for Chinese Spelling Check

    Full text link
    In recent years, Chinese Spelling Check (CSC) has been greatly improved by designing task-specific pre-training methods or introducing auxiliary tasks, which mostly solve this task in an end-to-end fashion. In this paper, we propose to decompose the CSC workflow into detection, reasoning, and searching subtasks so that the rich external knowledge about the Chinese language can be leveraged more directly and efficiently. Specifically, we design a plug-and-play detection-and-reasoning module that is compatible with existing SOTA non-autoregressive CSC models to further boost their performance. We find that the detection-and-reasoning module trained for one model can also benefit other models. We also study the primary interpretability provided by the task decomposition. Extensive experiments and detailed analyses demonstrate the effectiveness and competitiveness of the proposed module.Comment: Accepted for publication in Findings of EMNLP 202

    Emergent Literacy and Early Reading Skills in Chinese-Mandarin: Evidence from Kindergarten and First-Grade Children

    Get PDF
    The development of emergent literacy, a precursor to formal reading, has been linked to subsequent conventional literacy skills in Chinese children. The factors important for acquiring Chinese reading skills, such as phonological and morphological awareness, have primarily been studied in primary school children rather than preschoolers. The complete picture of factors contributing to early reading skills in Mandarin-speaking Chinese preschool children remains unclear. Objectives: The aim of this study was to explore emergent literacy and early reading skills in preschool and early school-aged children and investigate the connections between them to address gaps in existing literature. Methodology: A cross-sectional design was used to collect data from a sample of 66 children, including 35 in their second year of kindergarten and 31 first-grade children. Assessments were conducted on phonological awareness (syllable deletion), morphological awareness (lexical compounding, homophone judgment, and homophone generation), orthographic awareness (character judgment), vocabulary, and rapid automatized naming (RAN) of numbers. Reading outcomes were measured by character naming and word recognition. Results: The MANOVA findings showed a significant grade group effect on all measures, except for RAN accuracy. Specifically,first-grade children outperformed second-year kindergarten children in syllable deletion, lexical compounding, homophone generation, homophone judgment, character judgment, and vocabulary. Additionally, first-grade children named numbers faster than kindergarten children in RAN. The correlation and regression analyses suggest that advanced emergent literacy skills in children improve word reading, but the associations between emergent literacy and reading vary by grade level. Syllable deletion and lexical compounding are particularly important for kindergarten children at the initial stage of learning to read, while character judgment plays a prominent role in the reading development of primary school children. Homophone judgment develops early and expands progressively as children gain reading experience during their primary school years. The significance of homophone generation is minimal at preschool and early school ages. RAN response time may provide more informative insights than RAN accuracy, and the link between RAN and reading skills appears to weaken once children begin schooling. Additionally, maternal education level was a significant co-variate associated with character naming in preschool children. Implications: Findings carry implications for Chinese educators and parents. Incorporating metalinguistic awareness into classroom instruction can support children’s early reading development. Moreover, parents are encouraged to foster a literacy-rich home environment through experiences like interactive reading and character recognition, especially for preschool children without formal literacy instructions. Further longitudinal research is recommended to predict early reading skills in a larger sample of Mandarin-speaking preschoolers and establish age-specific educational goals

    Underlying Skills of Oral and Silent Reading Fluency in Chinese: Perspective of Visual Rapid Processing

    Get PDF
    Reading fluency is a critical skill to improve the quality of our daily life and working efficiency. The majority of previous studies focused on oral reading fluency rather than silent reading fluency, which is a much more dominant reading mode that is used in middle and high school and for leisure reading. It is still unclear whether the oral and silent reading fluency involved the same underlying skills. To address this issue, the present study examined the relationship between the visual rapid processing and Chinese reading fluency in different modes. Fifty-eight undergraduate students took part in the experiment. The phantom contour paradigm and the visual 1-back task were adopted to measure the visual rapid temporal and simultaneous processing respectively. These two tasks reflected the temporal and spatial dimensions of visual rapid processing separately. We recorded the temporal threshold in the phantom contour task, as well as reaction time and accuracy in the visual 1-back task. Reading fluency was measured in both single-character and sentence levels. Fluent reading of single characters was assessed with a paper-and-pencil lexical decision task, and a sentence verification task was developed to examine reading fluency on a sentence level. The reading fluency test in each level was conducted twice (i.e., oral reading and silent reading). Reading speed and accuracy were recorded. The correlation analysis showed that the temporal threshold in the phantom contour task did not correlate with the scores of the reading fluency tests. Although, the reaction time in visual 1-back task correlated with the reading speed of both oral and silent reading fluency, the comparison of the correlation coefficients revealed a closer relationship between the visual rapid simultaneous processing and silent reading. Furthermore, the visual rapid simultaneous processing exhibited a significant contribution to reading fluency in silent mode but not in oral reading mode. These findings suggest that the underlying mechanism between oral and silent reading fluency is different at the beginning of the basic visual coding. The current results also might reveal a potential modulation of the language characteristics of Chinese on the relationship between visual rapid processing and reading fluency

    An Investigation on Cognitive-Linguistic Skills of English-Chinese Bilingual Learners with and without Dyslexia in Singapore

    Get PDF
    This thesis investigates dyslexia and the cognitive-linguistics skills, namely phonological awareness, orthographic knowledge, morphological awareness and rapid naming, of bilingual learners in Singapore whose first language is English and second language is Chinese. The two main research aims are to investigate whether the English-Chinese bilingual learners with dyslexia diagnosed only in English are weaker than their typical counterparts in reading and all cognitive-linguistic skills in both languages or either language, and to investigate which cognitive-linguistic skills are strong predictors of reading in each language. Results show that the bilingual learners with dyslexia performed significantly poorer than their typical counterparts in reading and all cognitive-linguistic skills in both languages, although their dyslexia were diagnosed only in English. Results also found all English cognitive-linguistic skills predictive of English word reading, especially the unique predictive roles of morphological awareness and orthographic knowledge after rapid naming and phonological awareness were controlled. However, only rapid naming and morphological awareness were found to be predictive of Chinese word reading. The results suggest that dyslexia may manifest differently in reading and cognitive-linguistic skills of English and Chinese languages in the English-Chinese bilingual learners, based on the two different predictive models with different empirically and theoretically supported orders of cognitive-linguistic skills as predictors for reading development in the two languages. The difference in the unique contributions of the four cognitive-linguistic skills underlying the reading development of both languages may suggest the difference lies in language structure and instruction. Keywords: dyslexia, bilingualism, English reading, Chinese reading, cognitive-linguistic skill

    Pertanika Journal of Social Sciences & Humanities

    Get PDF

    Error Checking for Chinese Query by Mining Web Log

    Get PDF
    For the search engine, error-input query is a common phenomenon. This paper uses web log as the training set for the query error checking. Through the n-gram language model that is trained by web log, the queries are analyzed and checked. Some features including query words and their number are introduced into the model. At the same time data smoothing algorithm is used to solve data sparseness problem. It will improve the overall accuracy of the n-gram model. The experimental results show that it is effective

    Acquisition of Word Spellings and Meanings during Reading in Nonnative Chinese Speakers

    Get PDF
    This dissertation explored the acquisition of word spellings (orthographic learning) and word meanings (incidental word learning) during reading in adult nonnative Chinese speakers. Two studies were designed for this dissertation. In Study One, 45 Chinese as a foreign language (CFL) learners at intermediate and advanced proficiency levels participated and completed a character learning experiment in a self-teaching paradigm. Results indicate that CFL learners were able to use the phonetic regularity and semantic transparency of radicals to learn the spellings and pronunciations of new characters after limited exposures to the characters in a story context. In Study Two, 72 CFL learners at novice, intermediate, and advanced proficiency levels were asked to choose the meanings of unfamiliar words presented either in isolation or in sentence context. Results show that CFL learners were more able to infer word meanings in context than in isolation, and such lexical inference ability improved with increasing Chinese proficiency levels. The findings of this dissertation reveal the underlying mechanism of orthographic learning and incidental word learning and yield implications for instruction of Chinese as a foreign language in adult learners
    corecore