130 research outputs found

    Irreducible Curriculum for Language Model Pretraining

    Full text link
    Automatic data selection and curriculum design for training large language models is challenging, with only a few existing methods showing improvements over standard training. Furthermore, current schemes focus on domain-level selection, overlooking the more fine-grained contributions of each individual training point. It is difficult to apply traditional datapoint selection methods on large language models: most online batch selection methods perform two-times forward or backward passes, which introduces considerable extra costs with large-scale models. To mitigate these obstacles, we propose irreducible curriculum as a curriculum learning algorithm for language model pretraining, which prioritizes samples with higher learnability. Specifically, to avoid prohibitive extra computation overhead, we simulate the sample loss along the main model's training trajectory using a small-scale proxy model. Our experiments on the RedPajama-1B dataset demonstrate a consistent improvement on validation perplexity across all 7 domains compared to random uniform baseline and the anti-curriculum strategy. Our method also reduces the sharpness of the network and illustrates a better 5-shot accuracy on MMLU benchmarks

    DoGE: Domain Reweighting with Generalization Estimation

    Full text link
    The coverage and composition of the pretraining data corpus significantly impacts the generalization ability of large language models. Conventionally, the pretraining corpus is composed of various source domains (e.g. CommonCrawl, Wikipedia, Github etc.) according to certain sampling probabilities (domain weights). However, current methods lack a principled way to optimize domain weights for ultimate goal for generalization. We propose DOmain reweighting with Generalization Estimation (DoGE), where we reweigh the sampling probability from each domain based on its contribution to the final generalization objective assessed by a gradient-based generalization estimation function. First, we train a small-scale proxy model with a min-max optimization to obtain the reweighted domain weights. At each step, the domain weights are updated to maximize the overall generalization gain by mirror descent. Finally we use the obtained domain weights to train a larger scale full-size language model. On SlimPajama-6B dataset, with universal generalization objective, DoGE achieves better average perplexity and zero-shot reasoning accuracy. On out-of-domain generalization tasks, DoGE reduces perplexity on the target domain by a large margin. We further apply a parameter-selection scheme which improves the efficiency of generalization estimation

    Roles of circulating soluble interleukin (IL)-6 receptor and IL-6 receptor expression on CD4+ T cells in patients with chronic hepatitis B

    Get PDF
    SummaryObjectivesThe objective of this study was to investigate the potential clinical roles of circulating soluble interleukin (IL)-6 receptor (sIL-6R) and IL-6R expression on CD4+ T cells (CD4+ IL-6R+ T cells) in chronic hepatitis B (CHB) patients.MethodsOne hundred and thirty-three subjects, including 72 CHB patients, 27 asymptomatic carriers, eight acute hepatitis B (AHB) patients, and 26 healthy donors were included in this study. Plasma IL-6 and sIL-6R levels were measured by enzyme-linked immunosorbent assay (ELISA); the frequency of CD4+ IL-6R+ T cells was detected by flow cytometry analysis.ResultsOur data showed a significant increase in plasma sIL-6R levels and the frequency of CD4+ IL-6R+ T cells in peripheral blood in CHB patients compared to asymptomatic carriers and healthy controls (both p<0.05). The elevated prevalence of CD4+ IL-6R+ T cells was positively associated with increased serum alanine aminotransferase levels in CHB patients (r = 0.316, p = 0.007), but was not correlated with serum hepatitis B virus (HBV) DNA load. Moreover, CHB patients with an HBV DNA load >1.0×106 copies/ml had a lower level of plasma sIL-6R than those with an HBV DNA load <1.0×106 copies/ml.ConclusionsCirculating sIL-6R and CD4+ IL-6R+ T cells were increased in CHB patients. Elevated plasma sIL-6R is probably associated with HBV elimination, and CD4+ IL-6R+ T cells in peripheral blood might contribute to the pathogenesis of liver injury in CHB patients

    Elevated IL-6 Receptor Expression on CD4+ T Cells contributes to the increased Th17 Responses in patients with Chronic Hepatitis B

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Increased numbers of Interleukin-17-producing CD4<sup>+ </sup>T cells (Th17) have been found in association with hepatitis B virus (HBV)-induced liver injury. However, the mechanism underlying the increase of Th17 responses in patients with HBV infection remains unclear. In this study, we investigate the possible regulatory mechanisms of increased Th17 responses in patients with chronic hepatitis B(CHB).</p> <p>Methods</p> <p>Th17 response and IL-6R expression on CD4<sup>+ </sup>T cells in peripheral blood samples were determined by flow cytometry. Cytokines TGF-β, IL-1β, IL-6 and IL-17 in plasma and/or supernatant samples were determined by ELISA and the IL-17 and IL-6R mRNA levels were quantified by quantitative real-time reverse polymerase chain reaction.</p> <p>Results</p> <p>All these data indicated that the frequency of periphery Th17 cells is significantly correlated with the percentage of CD4<b><sup>+ </sup></b>T cells expressing IL-6R in CHB patients. CD4<sup>+ </sup>T cells from patients with CHB, but not those from healthy donors, produced higher levels of IL-17 and had more IL-6R expression upon stimulation with the HBV core antigen (HBcAg) in vitro. The PMA/ionomycin and HBcAg -stimulated up-regulation of IL-17 production by CD4<sup>+ </sup>T cells could be reversed by a neutralizing antibody against IL-6R.</p> <p>Conclusion</p> <p>we showed that enhancement of IL-6R expression on CD4<sup>+ </sup>T cells upon HBV infection contributes to increased Th17 response in patients with CHB.</p

    Dissection of a novel major stable QTL on chromosome 7D for grain hardness and its breeding value estimation in bread wheat

    Get PDF
    Grain hardness (Gh) is important for wheat processing and end-product quality. Puroindolines polymorphism explains over 60% of Gh variation and the novel genetic factors remain to be exploited. In this study, a total of 153 quantitative trait loci (QTLs), clustered into 12 genomic intervals (C1-C12), for 13 quality-related traits were identified using a recombinant inbred line population derived from the cross of Zhongkemai138 (ZKM138) and Chuanmai44 (CM44). Among them, C7 (harboring eight QTLs for different quality-related traits) and C8 (mainly harboring QGh.cib-5D.1 for Gh) were attributed to the famous genes, Rht-D1 and Pina, respectively, indicating that the correlation of involved traits was supported by the pleotropic or linked genes. Notably, a novel major stable QTL for Gh was detected in the C12, QGh.cib-7D, with ZKM138-derived allele increasing grain hardness, which was simultaneously mapped by the BSE-Seq method. The geographic pattern and transmissibility of this locus revealed that the increasing-Gh allele is highly frequently present in 85.79% of 373 worldwide wheat varieties and presented 99.31% transmissibility in 144 ZKM138-derivatives, indicating the non-negative effect on yield performance and that its indirect passive selection has happened during the actual breeding process. Thus, the contribution of this new Gh-related locus was highlighted in consideration of improving the efficiency and accuracy of the soft/hard material selection in the molecular marker-assisted process. Further, TraesCS7D02G099400, TraesCS7D02G098000, and TraesCS7D02G099500 were initially deduced to be the most potential candidate genes of QGh.cib-7D. Collectively, this study provided valuable information of elucidating the genetic architecture of Gh for wheat quality improvement

    Deep Learning-Enabled Fully Automated Pipeline System for Segmentation and Classification of Single-Mass Breast Lesions Using Contrast-Enhanced Mammography: A Prospective, Multicentre Study

    Get PDF
    Background Breast cancer is the leading cause of cancer-related deaths in women. However, accurate diagnosis of breast cancer using medical images heavily relies on the experience of radiologists. This study aimed to develop an artificial intelligence model that diagnosed single-mass breast lesions on contrast-enhanced mammography (CEM) for assisting the diagnostic workflow. Methods A total of 1912 women with single-mass breast lesions on CEM images before biopsy or surgery were included from June 2017 to October 2022 at three centres in China. Samples were divided into training and validation sets, internal testing set, pooled external testing set, and prospective testing set. A fully automated pipeline system (FAPS) using RefineNet and the Xception + Pyramid pooling module (PPM) was developed to perform the segmentation and classification of breast lesions. The performances of six radiologists and adjustments in Breast Imaging Reporting and Data System (BI-RADS) category 4 under the FAPS-assisted strategy were explored in pooled external and prospective testing sets. The segmentation performance was assessed using the Dice similarity coefficient (DSC), and the classification was assessed using heatmaps, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. The radiologists’ reading time was recorded for comparison with the FAPS. This trial is registered with China Clinical Trial Registration Centre (ChiCTR2200063444). Findings The FAPS-based segmentation task achieved DSCs of 0.888 ± 0.101, 0.820 ± 0.148 and 0.837 ± 0.132 in the internal, pooled external and prospective testing sets, respectively. For the classification task, the FAPS achieved AUCs of 0.947 (95% confidence interval [CI]: 0.916–0.978), 0.940 (95% [CI]: 0.894–0.987) and 0.891 (95% [CI]: 0.816–0.945). It outperformed radiologists in terms of classification efficiency based on single lesions (6 s vs 3 min). Moreover, the FAPS-assisted strategy improved the performance of radiologists. BI-RADS category 4 in 12.4% and 13.3% of patients was adjusted in two testing sets with the assistance of FAPS, which may play an important guiding role in the selection of clinical management strategies. Interpretation The FAPS based on CEM demonstrated the potential for the segmentation and classification of breast lesions, and had good generalisation ability and clinical applicability. Funding This study was supported by the Taishan Scholar Foundation of Shandong Province of China (tsqn202211378), National Natural Science Foundation of China (82001775), Natural Science Foundation of Shandong Province of China (ZR2021MH120), and Special Fund for Breast Disease Research of Shandong Medical Association (YXH2021ZX055)

    Magnetic topological insulator MnBi6Te10 with zero-field ferromagnetic state and gapped Dirac surface states

    Full text link
    Magnetic topological insulators (TIs) with nontrivial topological electronic structure and broken time-reversal symmetry exhibit various exotic topological quantum phenomena. The realization of such exotic phenomena at high temperature is one of central topics in this area. We reveal that MnBi6Te10 is a magnetic TI with an antiferromagnetic ground state below 10.8 K whose nontrivial topology is manifested by Dirac-like surface states. The ferromagnetic axion insulator state with Z4 = 2 emerges once spins polarized at field as low as 0.1 T, accompanied with saturated anomalous Hall resistivity up to 10 K. Such a ferromagnetic state is preserved even external field down to zero at 2 K. Theoretical calculations indicate that the few-layer ferromagnetic MnBi6Te10 is also topologically nontrivial with a non-zero Chern number. Angle-resolved photoemission spectroscopy experiments further reveal three types of Dirac surface states arising from different terminations on the cleavage surfaces, one of which has insulating behavior with an energy gap of ~ 28 meV at the Dirac point. These outstanding features suggest that MnBi6Te10 is a promising system to realize various topological quantum effects at zero field and high temperature.Comment: 18 pages, 4 figures and 1 tabl

    Directional perfect absorption using deep subwavelength low-permittivity films

    Get PDF
    We experimentally demonstrate single beam directional perfect absorption (to within experimental accuracy) of p-polarized light in the near-infrared using unpatterned, deep subwavelength films of indium tin oxide (ITO) on Ag. The experimental perfect absorption occurs slightly above the epsilon-near-zero (ENZ) frequency of ITO, where the permittivity is less than 1 in magnitude. Remarkably, we obtain perfect absorption for films whose thickness is as low as similar to 1/50th of the operating free-space wavelength and whose single pass attenuation is only similar to 5%. We further derive simple analytical conditions for perfect absorption in the subwavelength-film regime that reveal the constraints that the thin layer permittivity must satisfy if perfect absorption is to be achieved. Then, to get a physical insight on the perfect absorption properties, we analyze the eigenmodes of the layered structure by computing both the real-frequency/complex-wavenumber and the complex-frequency/real-wavenumber modal dispersion diagrams. These analyses allow us to attribute the experimental perfect absorption condition to the crossover between bound and leaky behavior of one eigenmode of the layered structure. Both modal methods show that perfect absorption occurs at a frequency slightly larger than the ENZ frequency, in agreement with experimental results, and both methods predict a second perfect absorption condition at higher frequencies, attributed to another crossover between bound and leaky behavior of the same eigenmode. Our results greatly expand the list of materials that can be considered for use as ultrathin perfect absorbers and provide a methodology for the design of absorbing systems at any desired frequencyopen9
    corecore