103 research outputs found

    RSpell: Retrieval-augmented Framework for Domain Adaptive Chinese Spelling Check

    Full text link
    Chinese Spelling Check (CSC) refers to the detection and correction of spelling errors in Chinese texts. In practical application scenarios, it is important to make CSC models have the ability to correct errors across different domains. In this paper, we propose a retrieval-augmented spelling check framework called RSpell, which searches corresponding domain terms and incorporates them into CSC models. Specifically, we employ pinyin fuzzy matching to search for terms, which are combined with the input and fed into the CSC model. Then, we introduce an adaptive process control mechanism to dynamically adjust the impact of external knowledge on the model. Additionally, we develop an iterative strategy for the RSpell framework to enhance reasoning capabilities. We conducted experiments on CSC datasets in three domains: law, medicine, and official document writing. The results demonstrate that RSpell achieves state-of-the-art performance in both zero-shot and fine-tuning scenarios, demonstrating the effectiveness of the retrieval-augmented CSC framework. Our code is available at https://github.com/47777777/Rspell

    Stress characteristics and stress reversal mechanism of white birch (Betula platyphylla) disks under different drying conditions

    Get PDF
    Drying stress is the main cause for the generation of drying cracks in wood disks during drying, which limits the processing and utilization of this valuable material. For this study, white birch disks with one trunk and a thickness of 30 mm were dried under three different drying conditions: 1) a very slow drying process preventing the generation of a radial moisture content (MC) gradient, 2) a drying process with slowly increasing temperature leading to a radial MC gradient, with a higher MC in the heartwood, and 3) the same heat drying process but the wood disks were partly covered with a thin plastic film prior to the drying process leading to a reversed radial MC gradient, i.e., a higher MC in the sapwood. For each drying condition, the tangential elastic strain in the wood disks was investigated for a mean MC of 26%, 18% and 10%, respectively, as a function of the radial distance from the pith in order to predict the drying stress. Furthermore, the stress characteristics and stress reversal mechanisms in wood disks are discussed in this paper with the help of stress analysis sketches

    MS-nowcasting: Operational Precipitation Nowcasting with Convolutional LSTMs at Microsoft Weather

    Full text link
    We present the encoder-forecaster convolutional long short-term memory (LSTM) deep-learning model that powers Microsoft Weather's operational precipitation nowcasting product. This model takes as input a sequence of weather radar mosaics and deterministically predicts future radar reflectivity at lead times up to 6 hours. By stacking a large input receptive field along the feature dimension and conditioning the model's forecaster with predictions from the physics-based High Resolution Rapid Refresh (HRRR) model, we are able to outperform optical flow and HRRR baselines by 20-25% on multiple metrics averaged over all lead times.Comment: Minor updates to reflect final submission to NeurIPS worksho

    Preliminary expression profile of cytokines in brain tissue of BALB/c mice with Angiostrongylus cantonensis infection

    Get PDF
    BACKGROUND: Angiostrongylus cantonensis (A. cantonensis) infection can result in increased risk of eosinophilic meningitis. Accumulation of eosinophils and inflammation can result in the A. cantonensis infection playing an important role in brain tissue injury during this pathological process. However, underlying mechanisms regarding the transcriptomic responses during brain tissue injury caused by A. cantonensis infection are yet to be elucidated. This study is aimed at identifying some genomic and transcriptomic factors influencing the accumulation of eosinophils and inflammation in the mouse brain infected with A. cantonensis. METHODS: An infected mouse model was prepared based on our laboratory experimental process, and then the mouse brain RNA Libraries were constructed for deep Sequencing with Illumina Genome Analyzer. The raw data was processed with a bioinformatics’ pipeline including Refseq genes expression analysis using cufflinks, annotation and classification of RNAs, lncRNA prediction as well as analysis of co-expression network. The analysis of Refseq data provides the measure of the presence and prevalence of transcripts from known and previously unknown genes. RESULTS: This study showed that Cys-Cys (CC) type chemokines such as CCL2, CCL8, CCL1, CCL24, CCL11, CCL7, CCL12 and CCL5 were elevated significantly at the late phase of infection. The up-regulation of CCL2 indicated that the worm of A. cantonensis had migrated into the mouse brain at an early infection phase. CCL2 could be induced in the brain injury during migration and CCL2 might play a major role in the neuropathic pain caused by A. cantonensis infection. The up-regulated expression of IL-4, IL-5, IL-10, and IL-13 showed Th2 cell predominance in immunopathological reactions at late infection phase in response to infection by A. cantonensis. These different cytokines can modulate and inhibit each other and function as a network with the specific potential to drive brain eosinophilic inflammation. The increase of ATF-3 expression at 21 dpi suggested the injury of neuronal cells at late phase of infection. 1217 new potential lncRNA were candidates of interest for further research. CONCLUSIONS: These cytokine networks play an important role in the development of central nervous system inflammation caused by A. cantonensis infection. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13071-015-0939-6) contains supplementary material, which is available to authorized users

    Association between ambient air pollution and hospital admissions, length of hospital stay and hospital cost for patients with cardiovascular diseases and comorbid diabetes mellitus: Base on 1,969,755 cases in Beijing, China, 2014–2019

    Get PDF
    Background: Evidence on the effects of the air pollutants on the hospital admissions, hospital cost and length of stay (LOS) among patients with comorbidities remains limited in China, particularly for patients with cardiovascular diseases and comorbid diabetes mellitus (CVD-DM). Methods: We collected daily data on CVD-DM patients from 242 hospitals in Beijing between 2014 and 2019. Generalized additive model was employed to quantify the associations between admissions, LOS, and hospital cost for CVD-DM patients and air pollutants. We further evaluated the attributable risk posed by air pollutants to CVD-DM patients, using both Chinese and WHO air quality guidelines as reference. Results: Per 10 ug/m3 increase of particles with an aerodynamic diameter \u3c 2.5 μm (PM2.5), particles with an aerodynamic diameter \u3c 10 μm (PM10), sulfur dioxide (SO2), nitrogen dioxide (NO2), carbonic oxide (CO) and ozone (O3) corresponded to a 0.64% (95% CI: 0.57 to 0.71), 0.52% (95% CI: 0.46 to 0.57), 0.93% (95% CI: 0.67 to 1.20), 0.98% (95% CI: 0.81 to 1.16), 1.66% (95% CI: 1.18 to 2.14) and 0.53% (95% CI: 0.45 to 0.61) increment for CVD-DM patients’ admissions. Among the six pollutants, particulate pollutants (PM2.5 and PM10) in most lag days exhibited adverse effects on LOS and hospital cost. For every 10 ug/m3 increase in PM2.5 and PM10, the absolute increase with LOS will increase 62.08 days (95% CI: 28.93 to 95.23) and 51.77 days (95% CI:22.88 to 80.66), respectively. The absolute increase with hospital cost will increase 105.04 Chinese Yuan (CNY) (95% CI: 49.27 to 160.81) and 81.76 CNY (95% CI: 42.01 to 121.51) in PM2.5 and PM10, respectively. Given WHO 2021 air quality guideline as the reference, PM2.5 had the maximum attributable fraction of 3.34% (95% CI: 2.94% to 3.75%), corresponding to an avoidable of 65,845 (95% CI: 57,953 to 73,812) patients with CVD-DM. Conclusion: PM2.5 and PM10 are positively associated with hospital admissions, hospital cost and LOS for patients with CVD-DM. Policy changes to reduce air pollutants exposure may reduce CVD-DM admissions and substantial savings in health care spending and LOS

    Causal effect of PM1 on morbidity of cause-specific respiratory diseases based on a negative control exposure

    Get PDF
    Background: Extensive studies have linked PM2.5 and PM10 with respiratory diseases (RD). However, few is known about causal association between PM1 and morbidity of RD. We aimed to assess the causal effects of PM1 on cause-specific RD. Methods: Hospital admission data were obtained for RD during 2014 and 2019 in Beijing, China. Negative control exposure and extreme gradient boosting with SHapley Additive exPlanation was used to explore the causality and contribution between PM1 and RD. Stratified analysis by gender, age, and season was conducted. Results: A total of 1,183,591 admissions for RD were recorded. Per interquartile range (28 μg/m3) uptick in concentration of PM1 corresponded to a 3.08% [95% confidence interval (CI): 1.66%–4.52%] increment in morbidity of total RD. And that was 4.47% (95% CI: 2.46%–6.52%) and 0.15% (95% CI: 1.44%-1.78%), for COPD and asthma, respectively. Significantly positive causal associations were observed for PM1 with total RD and COPD. Females and the elderly had higher effects on total RD, COPD, and asthma only in the warm months (Z = 3.03, P = 0.002; Z = 4.01, P \u3c 0.001; Z = 3.92, P \u3c 0.001; Z = 2.11, P = 0.035; Z = 2.44, P = 0.015). Contribution of PM1 ranked first, second and second for total RD, COPD, and asthma among air pollutants. Conclusion: PM1 was causally associated with increased morbidity of total RD and COPD, but not causally associated with asthma. Females and the elderly were more vulnerable to PM1-associated effects on RD

    The Genomes of Oryza sativa: A History of Duplications

    Get PDF
    We report improved whole-genome shotgun sequences for the genomes of indica and japonica rice, both with multimegabase contiguity, or almost 1,000-fold improvement over the drafts of 2002. Tested against a nonredundant collection of 19,079 full-length cDNAs, 97.7% of the genes are aligned, without fragmentation, to the mapped super-scaffolds of one or the other genome. We introduce a gene identification procedure for plants that does not rely on similarity to known genes to remove erroneous predictions resulting from transposable elements. Using the available EST data to adjust for residual errors in the predictions, the estimated gene count is at least 38,000–40,000. Only 2%–3% of the genes are unique to any one subspecies, comparable to the amount of sequence that might still be missing. Despite this lack of variation in gene content, there is enormous variation in the intergenic regions. At least a quarter of the two sequences could not be aligned, and where they could be aligned, single nucleotide polymorphism (SNP) rates varied from as little as 3.0 SNP/kb in the coding regions to 27.6 SNP/kb in the transposable elements. A more inclusive new approach for analyzing duplication history is introduced here. It reveals an ancient whole-genome duplication, a recent segmental duplication on Chromosomes 11 and 12, and massive ongoing individual gene duplications. We find 18 distinct pairs of duplicated segments that cover 65.7% of the genome; 17 of these pairs date back to a common time before the divergence of the grasses. More important, ongoing individual gene duplications provide a never-ending source of raw material for gene genesis and are major contributors to the differences between members of the grass family

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead
    corecore