13 research outputs found
MixEdit: Revisiting Data Augmentation and Beyond for Grammatical Error Correction
Data Augmentation through generating pseudo data has been proven effective in
mitigating the challenge of data scarcity in the field of Grammatical Error
Correction (GEC). Various augmentation strategies have been widely explored,
most of which are motivated by two heuristics, i.e., increasing the
distribution similarity and diversity of pseudo data. However, the underlying
mechanism responsible for the effectiveness of these strategies remains poorly
understood. In this paper, we aim to clarify how data augmentation improves GEC
models. To this end, we introduce two interpretable and computationally
efficient measures: Affinity and Diversity. Our findings indicate that an
excellent GEC data augmentation strategy characterized by high Affinity and
appropriate Diversity can better improve the performance of GEC models. Based
on this observation, we propose MixEdit, a data augmentation approach that
strategically and dynamically augments realistic data, without requiring extra
monolingual corpora. To verify the correctness of our findings and the
effectiveness of the proposed MixEdit, we conduct experiments on mainstream
English and Chinese GEC datasets. The results show that MixEdit substantially
improves GEC models and is complementary to traditional data augmentation
methods.Comment: Accepted to Findings of EMNLP 202
Focus Is What You Need For Chinese Grammatical Error Correction
Chinese Grammatical Error Correction (CGEC) aims to automatically detect and
correct grammatical errors contained in Chinese text. In the long term,
researchers regard CGEC as a task with a certain degree of uncertainty, that
is, an ungrammatical sentence may often have multiple references. However, we
argue that even though this is a very reasonable hypothesis, it is too harsh
for the intelligence of the mainstream models in this era. In this paper, we
first discover that multiple references do not actually bring positive gains to
model training. On the contrary, it is beneficial to the CGEC model if the
model can pay attention to small but essential data during the training
process. Furthermore, we propose a simple yet effective training strategy
called OneTarget to improve the focus ability of the CGEC models and thus
improve the CGEC performance. Extensive experiments and detailed analyses
demonstrate the correctness of our discovery and the effectiveness of our
proposed method.Comment: Submitted to ICASSP2023 (currently under review
A Frustratingly Easy Plug-and-Play Detection-and-Reasoning Module for Chinese Spelling Check
In recent years, Chinese Spelling Check (CSC) has been greatly improved by
designing task-specific pre-training methods or introducing auxiliary tasks,
which mostly solve this task in an end-to-end fashion. In this paper, we
propose to decompose the CSC workflow into detection, reasoning, and searching
subtasks so that the rich external knowledge about the Chinese language can be
leveraged more directly and efficiently. Specifically, we design a
plug-and-play detection-and-reasoning module that is compatible with existing
SOTA non-autoregressive CSC models to further boost their performance. We find
that the detection-and-reasoning module trained for one model can also benefit
other models. We also study the primary interpretability provided by the task
decomposition. Extensive experiments and detailed analyses demonstrate the
effectiveness and competitiveness of the proposed module.Comment: Accepted for publication in Findings of EMNLP 202
Apilactobacillus kunkeei Alleviated Toxicity of Acetamiprid in Honeybee
Nowadays, colony collapse disorder extensively affects honeybees. Insecticides, including acetamiprid, are considered as critical factors. As prevalent probiotics, we speculated that supplementation with lactic acid bacteria (LAB) could alleviate acetamiprid-induced health injuries in honeybees. Apilactobacillus kunkeei was isolated from beebread; it significantly increased the survival of honeybees under acetamiprid exportation (from 84% to 92%). Based on 16S rRNA pyrosequencing, information on the intestinal bacteria of honeybees was acquired. The results showed that supplementation with A. kunkeei significantly increased survival and decreased pollen consumption by honeybees under acetamiprid exportation. Under acetamiprid exportation, some opportunistic and pathogenic bacteria invaded the intestinal regions. Subsequently, the community richness and diversity of symbiotic microbiota were decreased. The community structure of intestinal bacteria was changed and differentiated. However, with the supplementation of A. kunkeei, the community richness and community diversity of symbiotic microbiota showed an upward trend, and the community structure was stabilized. Our results showed that A. kunkeei alleviated acetamiprid-induced symbiotic microbiota dysregulation and mortality in honeybees. This demonstrates the importance of symbiotic microbiota in honeybees and supports the application of Apilactobacillus kunkeei as probiotics in beekeeping
Comparative effectiveness research on patients with acute ischemic stroke using Markov decision processes
Abstract Background Several methodological issues with non-randomized comparative clinical studies have been raised, one of which is whether the methods used can adequately identify uncertainties that evolve dynamically with time in real-world systems. The objective of this study is to compare the effectiveness of different combinations of Traditional Chinese Medicine (TCM) treatments and combinations of TCM and Western medicine interventions in patients with acute ischemic stroke (AIS) by using Markov decision process (MDP) theory. MDP theory appears to be a promising new method for use in comparative effectiveness research. Methods The electronic health records (EHR) of patients with AIS hospitalized at the 2nd Affiliated Hospital of Guangzhou University of Chinese Medicine between May 2005 and July 2008 were collected. Each record was portioned into two "state-action-reward" stages divided by three time points: the first, third, and last day of hospital stay. We used the well-developed optimality technique in MDP theory with the finite horizon criterion to make the dynamic comparison of different treatment combinations. Results A total of 1504 records with a primary diagnosis of AIS were identified. Only states with more than 10 (including 10) patients' information were included, which gave 960 records to be enrolled in the MDP model. Optimal combinations were obtained for 30 types of patient condition. Conclusion MDP theory makes it possible to dynamically compare the effectiveness of different combinations of treatments. However, the optimal interventions obtained by the MDP theory here require further validation in clinical practice. Further exploratory studies with MDP theory in other areas in which complex interventions are common would be worthwhile.</p