13 research outputs found

    MixEdit: Revisiting Data Augmentation and Beyond for Grammatical Error Correction

    Full text link
    Data Augmentation through generating pseudo data has been proven effective in mitigating the challenge of data scarcity in the field of Grammatical Error Correction (GEC). Various augmentation strategies have been widely explored, most of which are motivated by two heuristics, i.e., increasing the distribution similarity and diversity of pseudo data. However, the underlying mechanism responsible for the effectiveness of these strategies remains poorly understood. In this paper, we aim to clarify how data augmentation improves GEC models. To this end, we introduce two interpretable and computationally efficient measures: Affinity and Diversity. Our findings indicate that an excellent GEC data augmentation strategy characterized by high Affinity and appropriate Diversity can better improve the performance of GEC models. Based on this observation, we propose MixEdit, a data augmentation approach that strategically and dynamically augments realistic data, without requiring extra monolingual corpora. To verify the correctness of our findings and the effectiveness of the proposed MixEdit, we conduct experiments on mainstream English and Chinese GEC datasets. The results show that MixEdit substantially improves GEC models and is complementary to traditional data augmentation methods.Comment: Accepted to Findings of EMNLP 202

    Focus Is What You Need For Chinese Grammatical Error Correction

    Full text link
    Chinese Grammatical Error Correction (CGEC) aims to automatically detect and correct grammatical errors contained in Chinese text. In the long term, researchers regard CGEC as a task with a certain degree of uncertainty, that is, an ungrammatical sentence may often have multiple references. However, we argue that even though this is a very reasonable hypothesis, it is too harsh for the intelligence of the mainstream models in this era. In this paper, we first discover that multiple references do not actually bring positive gains to model training. On the contrary, it is beneficial to the CGEC model if the model can pay attention to small but essential data during the training process. Furthermore, we propose a simple yet effective training strategy called OneTarget to improve the focus ability of the CGEC models and thus improve the CGEC performance. Extensive experiments and detailed analyses demonstrate the correctness of our discovery and the effectiveness of our proposed method.Comment: Submitted to ICASSP2023 (currently under review

    A Frustratingly Easy Plug-and-Play Detection-and-Reasoning Module for Chinese Spelling Check

    Full text link
    In recent years, Chinese Spelling Check (CSC) has been greatly improved by designing task-specific pre-training methods or introducing auxiliary tasks, which mostly solve this task in an end-to-end fashion. In this paper, we propose to decompose the CSC workflow into detection, reasoning, and searching subtasks so that the rich external knowledge about the Chinese language can be leveraged more directly and efficiently. Specifically, we design a plug-and-play detection-and-reasoning module that is compatible with existing SOTA non-autoregressive CSC models to further boost their performance. We find that the detection-and-reasoning module trained for one model can also benefit other models. We also study the primary interpretability provided by the task decomposition. Extensive experiments and detailed analyses demonstrate the effectiveness and competitiveness of the proposed module.Comment: Accepted for publication in Findings of EMNLP 202

    Apilactobacillus kunkeei Alleviated Toxicity of Acetamiprid in Honeybee

    No full text
    Nowadays, colony collapse disorder extensively affects honeybees. Insecticides, including acetamiprid, are considered as critical factors. As prevalent probiotics, we speculated that supplementation with lactic acid bacteria (LAB) could alleviate acetamiprid-induced health injuries in honeybees. Apilactobacillus kunkeei was isolated from beebread; it significantly increased the survival of honeybees under acetamiprid exportation (from 84% to 92%). Based on 16S rRNA pyrosequencing, information on the intestinal bacteria of honeybees was acquired. The results showed that supplementation with A. kunkeei significantly increased survival and decreased pollen consumption by honeybees under acetamiprid exportation. Under acetamiprid exportation, some opportunistic and pathogenic bacteria invaded the intestinal regions. Subsequently, the community richness and diversity of symbiotic microbiota were decreased. The community structure of intestinal bacteria was changed and differentiated. However, with the supplementation of A. kunkeei, the community richness and community diversity of symbiotic microbiota showed an upward trend, and the community structure was stabilized. Our results showed that A. kunkeei alleviated acetamiprid-induced symbiotic microbiota dysregulation and mortality in honeybees. This demonstrates the importance of symbiotic microbiota in honeybees and supports the application of Apilactobacillus kunkeei as probiotics in beekeeping

    Comparative effectiveness research on patients with acute ischemic stroke using Markov decision processes

    No full text
    Abstract Background Several methodological issues with non-randomized comparative clinical studies have been raised, one of which is whether the methods used can adequately identify uncertainties that evolve dynamically with time in real-world systems. The objective of this study is to compare the effectiveness of different combinations of Traditional Chinese Medicine (TCM) treatments and combinations of TCM and Western medicine interventions in patients with acute ischemic stroke (AIS) by using Markov decision process (MDP) theory. MDP theory appears to be a promising new method for use in comparative effectiveness research. Methods The electronic health records (EHR) of patients with AIS hospitalized at the 2nd Affiliated Hospital of Guangzhou University of Chinese Medicine between May 2005 and July 2008 were collected. Each record was portioned into two "state-action-reward" stages divided by three time points: the first, third, and last day of hospital stay. We used the well-developed optimality technique in MDP theory with the finite horizon criterion to make the dynamic comparison of different treatment combinations. Results A total of 1504 records with a primary diagnosis of AIS were identified. Only states with more than 10 (including 10) patients' information were included, which gave 960 records to be enrolled in the MDP model. Optimal combinations were obtained for 30 types of patient condition. Conclusion MDP theory makes it possible to dynamically compare the effectiveness of different combinations of treatments. However, the optimal interventions obtained by the MDP theory here require further validation in clinical practice. Further exploratory studies with MDP theory in other areas in which complex interventions are common would be worthwhile.</p
    corecore