127 research outputs found

    Irreducible Curriculum for Language Model Pretraining

    Full text link
    Automatic data selection and curriculum design for training large language models is challenging, with only a few existing methods showing improvements over standard training. Furthermore, current schemes focus on domain-level selection, overlooking the more fine-grained contributions of each individual training point. It is difficult to apply traditional datapoint selection methods on large language models: most online batch selection methods perform two-times forward or backward passes, which introduces considerable extra costs with large-scale models. To mitigate these obstacles, we propose irreducible curriculum as a curriculum learning algorithm for language model pretraining, which prioritizes samples with higher learnability. Specifically, to avoid prohibitive extra computation overhead, we simulate the sample loss along the main model's training trajectory using a small-scale proxy model. Our experiments on the RedPajama-1B dataset demonstrate a consistent improvement on validation perplexity across all 7 domains compared to random uniform baseline and the anti-curriculum strategy. Our method also reduces the sharpness of the network and illustrates a better 5-shot accuracy on MMLU benchmarks

    DoGE: Domain Reweighting with Generalization Estimation

    Full text link
    The coverage and composition of the pretraining data corpus significantly impacts the generalization ability of large language models. Conventionally, the pretraining corpus is composed of various source domains (e.g. CommonCrawl, Wikipedia, Github etc.) according to certain sampling probabilities (domain weights). However, current methods lack a principled way to optimize domain weights for ultimate goal for generalization. We propose DOmain reweighting with Generalization Estimation (DoGE), where we reweigh the sampling probability from each domain based on its contribution to the final generalization objective assessed by a gradient-based generalization estimation function. First, we train a small-scale proxy model with a min-max optimization to obtain the reweighted domain weights. At each step, the domain weights are updated to maximize the overall generalization gain by mirror descent. Finally we use the obtained domain weights to train a larger scale full-size language model. On SlimPajama-6B dataset, with universal generalization objective, DoGE achieves better average perplexity and zero-shot reasoning accuracy. On out-of-domain generalization tasks, DoGE reduces perplexity on the target domain by a large margin. We further apply a parameter-selection scheme which improves the efficiency of generalization estimation

    Roles of circulating soluble interleukin (IL)-6 receptor and IL-6 receptor expression on CD4+ T cells in patients with chronic hepatitis B

    Get PDF
    SummaryObjectivesThe objective of this study was to investigate the potential clinical roles of circulating soluble interleukin (IL)-6 receptor (sIL-6R) and IL-6R expression on CD4+ T cells (CD4+ IL-6R+ T cells) in chronic hepatitis B (CHB) patients.MethodsOne hundred and thirty-three subjects, including 72 CHB patients, 27 asymptomatic carriers, eight acute hepatitis B (AHB) patients, and 26 healthy donors were included in this study. Plasma IL-6 and sIL-6R levels were measured by enzyme-linked immunosorbent assay (ELISA); the frequency of CD4+ IL-6R+ T cells was detected by flow cytometry analysis.ResultsOur data showed a significant increase in plasma sIL-6R levels and the frequency of CD4+ IL-6R+ T cells in peripheral blood in CHB patients compared to asymptomatic carriers and healthy controls (both p<0.05). The elevated prevalence of CD4+ IL-6R+ T cells was positively associated with increased serum alanine aminotransferase levels in CHB patients (r = 0.316, p = 0.007), but was not correlated with serum hepatitis B virus (HBV) DNA load. Moreover, CHB patients with an HBV DNA load >1.0×106 copies/ml had a lower level of plasma sIL-6R than those with an HBV DNA load <1.0×106 copies/ml.ConclusionsCirculating sIL-6R and CD4+ IL-6R+ T cells were increased in CHB patients. Elevated plasma sIL-6R is probably associated with HBV elimination, and CD4+ IL-6R+ T cells in peripheral blood might contribute to the pathogenesis of liver injury in CHB patients

    Elevated IL-6 Receptor Expression on CD4+ T Cells contributes to the increased Th17 Responses in patients with Chronic Hepatitis B

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Increased numbers of Interleukin-17-producing CD4<sup>+ </sup>T cells (Th17) have been found in association with hepatitis B virus (HBV)-induced liver injury. However, the mechanism underlying the increase of Th17 responses in patients with HBV infection remains unclear. In this study, we investigate the possible regulatory mechanisms of increased Th17 responses in patients with chronic hepatitis B(CHB).</p> <p>Methods</p> <p>Th17 response and IL-6R expression on CD4<sup>+ </sup>T cells in peripheral blood samples were determined by flow cytometry. Cytokines TGF-β, IL-1β, IL-6 and IL-17 in plasma and/or supernatant samples were determined by ELISA and the IL-17 and IL-6R mRNA levels were quantified by quantitative real-time reverse polymerase chain reaction.</p> <p>Results</p> <p>All these data indicated that the frequency of periphery Th17 cells is significantly correlated with the percentage of CD4<b><sup>+ </sup></b>T cells expressing IL-6R in CHB patients. CD4<sup>+ </sup>T cells from patients with CHB, but not those from healthy donors, produced higher levels of IL-17 and had more IL-6R expression upon stimulation with the HBV core antigen (HBcAg) in vitro. The PMA/ionomycin and HBcAg -stimulated up-regulation of IL-17 production by CD4<sup>+ </sup>T cells could be reversed by a neutralizing antibody against IL-6R.</p> <p>Conclusion</p> <p>we showed that enhancement of IL-6R expression on CD4<sup>+ </sup>T cells upon HBV infection contributes to increased Th17 response in patients with CHB.</p

    Dissection of a novel major stable QTL on chromosome 7D for grain hardness and its breeding value estimation in bread wheat

    Get PDF
    Grain hardness (Gh) is important for wheat processing and end-product quality. Puroindolines polymorphism explains over 60% of Gh variation and the novel genetic factors remain to be exploited. In this study, a total of 153 quantitative trait loci (QTLs), clustered into 12 genomic intervals (C1-C12), for 13 quality-related traits were identified using a recombinant inbred line population derived from the cross of Zhongkemai138 (ZKM138) and Chuanmai44 (CM44). Among them, C7 (harboring eight QTLs for different quality-related traits) and C8 (mainly harboring QGh.cib-5D.1 for Gh) were attributed to the famous genes, Rht-D1 and Pina, respectively, indicating that the correlation of involved traits was supported by the pleotropic or linked genes. Notably, a novel major stable QTL for Gh was detected in the C12, QGh.cib-7D, with ZKM138-derived allele increasing grain hardness, which was simultaneously mapped by the BSE-Seq method. The geographic pattern and transmissibility of this locus revealed that the increasing-Gh allele is highly frequently present in 85.79% of 373 worldwide wheat varieties and presented 99.31% transmissibility in 144 ZKM138-derivatives, indicating the non-negative effect on yield performance and that its indirect passive selection has happened during the actual breeding process. Thus, the contribution of this new Gh-related locus was highlighted in consideration of improving the efficiency and accuracy of the soft/hard material selection in the molecular marker-assisted process. Further, TraesCS7D02G099400, TraesCS7D02G098000, and TraesCS7D02G099500 were initially deduced to be the most potential candidate genes of QGh.cib-7D. Collectively, this study provided valuable information of elucidating the genetic architecture of Gh for wheat quality improvement

    Magnetic topological insulator MnBi6Te10 with zero-field ferromagnetic state and gapped Dirac surface states

    Full text link
    Magnetic topological insulators (TIs) with nontrivial topological electronic structure and broken time-reversal symmetry exhibit various exotic topological quantum phenomena. The realization of such exotic phenomena at high temperature is one of central topics in this area. We reveal that MnBi6Te10 is a magnetic TI with an antiferromagnetic ground state below 10.8 K whose nontrivial topology is manifested by Dirac-like surface states. The ferromagnetic axion insulator state with Z4 = 2 emerges once spins polarized at field as low as 0.1 T, accompanied with saturated anomalous Hall resistivity up to 10 K. Such a ferromagnetic state is preserved even external field down to zero at 2 K. Theoretical calculations indicate that the few-layer ferromagnetic MnBi6Te10 is also topologically nontrivial with a non-zero Chern number. Angle-resolved photoemission spectroscopy experiments further reveal three types of Dirac surface states arising from different terminations on the cleavage surfaces, one of which has insulating behavior with an energy gap of ~ 28 meV at the Dirac point. These outstanding features suggest that MnBi6Te10 is a promising system to realize various topological quantum effects at zero field and high temperature.Comment: 18 pages, 4 figures and 1 tabl

    Directional perfect absorption using deep subwavelength low-permittivity films

    Get PDF
    We experimentally demonstrate single beam directional perfect absorption (to within experimental accuracy) of p-polarized light in the near-infrared using unpatterned, deep subwavelength films of indium tin oxide (ITO) on Ag. The experimental perfect absorption occurs slightly above the epsilon-near-zero (ENZ) frequency of ITO, where the permittivity is less than 1 in magnitude. Remarkably, we obtain perfect absorption for films whose thickness is as low as similar to 1/50th of the operating free-space wavelength and whose single pass attenuation is only similar to 5%. We further derive simple analytical conditions for perfect absorption in the subwavelength-film regime that reveal the constraints that the thin layer permittivity must satisfy if perfect absorption is to be achieved. Then, to get a physical insight on the perfect absorption properties, we analyze the eigenmodes of the layered structure by computing both the real-frequency/complex-wavenumber and the complex-frequency/real-wavenumber modal dispersion diagrams. These analyses allow us to attribute the experimental perfect absorption condition to the crossover between bound and leaky behavior of one eigenmode of the layered structure. Both modal methods show that perfect absorption occurs at a frequency slightly larger than the ENZ frequency, in agreement with experimental results, and both methods predict a second perfect absorption condition at higher frequencies, attributed to another crossover between bound and leaky behavior of the same eigenmode. Our results greatly expand the list of materials that can be considered for use as ultrathin perfect absorbers and provide a methodology for the design of absorbing systems at any desired frequencyopen9

    MEDITRON-70B: Scaling Medical Pretraining for Large Language Models

    Full text link
    Large language models (LLMs) can potentially democratize access to medical knowledge. While many efforts have been made to harness and improve LLMs' medical knowledge and reasoning capacities, the resulting models are either closed-source (e.g., PaLM, GPT-4) or limited in scale (<= 13B parameters), which restricts their abilities. In this work, we improve access to large-scale medical LLMs by releasing MEDITRON: a suite of open-source LLMs with 7B and 70B parameters adapted to the medical domain. MEDITRON builds on Llama-2 (through our adaptation of Nvidia's Megatron-LM distributed trainer), and extends pretraining on a comprehensively curated medical corpus, including selected PubMed articles, abstracts, and internationally-recognized medical guidelines. Evaluations using four major medical benchmarks show significant performance gains over several state-of-the-art baselines before and after task-specific finetuning. Overall, MEDITRON achieves a 6% absolute performance gain over the best public baseline in its parameter class and 3% over the strongest baseline we finetuned from Llama-2. Compared to closed-source LLMs, MEDITRON-70B outperforms GPT-3.5 and Med-PaLM and is within 5% of GPT-4 and 10% of Med-PaLM-2. We release our code for curating the medical pretraining corpus and the MEDITRON model weights to drive open-source development of more capable medical LLMs
    corecore