Search CORE

130 research outputs found

Irreducible Curriculum for Language Model Pretraining

Author: Fan Simin
Jaggi Martin
Publication venue
Publication date: 23/10/2023
Field of study

Automatic data selection and curriculum design for training large language models is challenging, with only a few existing methods showing improvements over standard training. Furthermore, current schemes focus on domain-level selection, overlooking the more fine-grained contributions of each individual training point. It is difficult to apply traditional datapoint selection methods on large language models: most online batch selection methods perform two-times forward or backward passes, which introduces considerable extra costs with large-scale models. To mitigate these obstacles, we propose irreducible curriculum as a curriculum learning algorithm for language model pretraining, which prioritizes samples with higher learnability. Specifically, to avoid prohibitive extra computation overhead, we simulate the sample loss along the main model's training trajectory using a small-scale proxy model. Our experiments on the RedPajama-1B dataset demonstrate a consistent improvement on validation perplexity across all 7 domains compared to random uniform baseline and the anti-curriculum strategy. Our method also reduces the sharpness of the network and illustrates a better 5-shot accuracy on MMLU benchmarks

arXiv.org e-Print Archive

DoGE: Domain Reweighting with Generalization Estimation

Author: Fan Simin
Jaggi Martin
Pagliardini Matteo
Publication venue
Publication date: 23/10/2023
Field of study

The coverage and composition of the pretraining data corpus significantly impacts the generalization ability of large language models. Conventionally, the pretraining corpus is composed of various source domains (e.g. CommonCrawl, Wikipedia, Github etc.) according to certain sampling probabilities (domain weights). However, current methods lack a principled way to optimize domain weights for ultimate goal for generalization. We propose DOmain reweighting with Generalization Estimation (DoGE), where we reweigh the sampling probability from each domain based on its contribution to the final generalization objective assessed by a gradient-based generalization estimation function. First, we train a small-scale proxy model with a min-max optimization to obtain the reweighted domain weights. At each step, the domain weights are updated to maximize the overall generalization gain by mirror descent. Finally we use the obtained domain weights to train a larger scale full-size language model. On SlimPajama-6B dataset, with universal generalization objective, DoGE achieves better average perplexity and zero-shot reasoning accuracy. On out-of-domain generalization tasks, DoGE reduces perplexity on the target domain by a large margin. We further apply a parameter-selection scheme which improves the efficiency of generalization estimation

arXiv.org e-Print Archive

Roles of circulating soluble interleukin (IL)-6 receptor and IL-6 receptor expression on CD4+ T cells in patients with chronic hepatitis B

Author: Chen Xinchun
Yao Simin
Yuan Jing
Zhang Fan
Zhang Mingxia
Zhou Boping
Publication venue: International Society for Infectious Diseases. Published by Elsevier Ltd.
Publication date: 30/04/2011
Field of study

SummaryObjectivesThe objective of this study was to investigate the potential clinical roles of circulating soluble interleukin (IL)-6 receptor (sIL-6R) and IL-6R expression on CD4+ T cells (CD4+ IL-6R+ T cells) in chronic hepatitis B (CHB) patients.MethodsOne hundred and thirty-three subjects, including 72 CHB patients, 27 asymptomatic carriers, eight acute hepatitis B (AHB) patients, and 26 healthy donors were included in this study. Plasma IL-6 and sIL-6R levels were measured by enzyme-linked immunosorbent assay (ELISA); the frequency of CD4+ IL-6R+ T cells was detected by flow cytometry analysis.ResultsOur data showed a significant increase in plasma sIL-6R levels and the frequency of CD4+ IL-6R+ T cells in peripheral blood in CHB patients compared to asymptomatic carriers and healthy controls (both p<0.05). The elevated prevalence of CD4+ IL-6R+ T cells was positively associated with increased serum alanine aminotransferase levels in CHB patients (r = 0.316, p = 0.007), but was not correlated with serum hepatitis B virus (HBV) DNA load. Moreover, CHB patients with an HBV DNA load >1.0×106 copies/ml had a lower level of plasma sIL-6R than those with an HBV DNA load <1.0×106 copies/ml.ConclusionsCirculating sIL-6R and CD4+ IL-6R+ T cells were increased in CHB patients. Elevated plasma sIL-6R is probably associated with HBV elimination, and CD4+ IL-6R+ T cells in peripheral blood might contribute to the pathogenesis of liver injury in CHB patients

Elsevier - Publisher Connector

Inhibition of PTEN Activity Aggravates Post Renal Fibrosis in Mice with Ischemia Reperfusion-Induced Acute Kidney Injury

Author: Chengxiang Yang
Hongtao Chen
Jiying Zhong
Jun Zhou
Sen Lin
Simin Tang
Youling Fan
Zhenxing Huang
Publication venue: 'S. Karger AG'
Publication date
Field of study

Crossref

Elevated IL-6 Receptor Expression on CD4+ T Cells contributes to the increased Th17 Responses in patients with Chronic Hepatitis B

Author: Chen Xinchun
Gao Zhiliang
He Qing
Liu Hong
Yang Guilin
Yao Simin
Yuan Jing
Zhang Fan
Zhang Mingxia
Zhou Boping
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Increased numbers of Interleukin-17-producing CD4+ T cells (Th17) have been found in association with hepatitis B virus (HBV)-induced liver injury. However, the mechanism underlying the increase of Th17 responses in patients with HBV infection remains unclear. In this study, we investigate the possible regulatory mechanisms of increased Th17 responses in patients with chronic hepatitis B(CHB). Methods Th17 response and IL-6R expression on CD4+ T cells in peripheral blood samples were determined by flow cytometry. Cytokines TGF-β, IL-1β, IL-6 and IL-17 in plasma and/or supernatant samples were determined by ELISA and the IL-17 and IL-6R mRNA levels were quantified by quantitative real-time reverse polymerase chain reaction. Results All these data indicated that the frequency of periphery Th17 cells is significantly correlated with the percentage of CD4+ T cells expressing IL-6R in CHB patients. CD4+ T cells from patients with CHB, but not those from healthy donors, produced higher levels of IL-17 and had more IL-6R expression upon stimulation with the HBV core antigen (HBcAg) in vitro. The PMA/ionomycin and HBcAg -stimulated up-regulation of IL-17 production by CD4+ T cells could be reversed by a neutralizing antibody against IL-6R. Conclusion we showed that enhancement of IL-6R expression on CD4+ T cells upon HBV infection contributes to increased Th17 response in patients with CHB.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Dissection of a novel major stable QTL on chromosome 7D for grain hardness and its breeding value estimation in bread wheat

Author: Bo Feng
Qiang Zhou
Shaodan Guo
Simin Liao
Simin Liao
Tao Wang
Tao Wang
Xiaofeng Liu
Xiaofeng Liu
Xiaoli Fan
Yuhao Ou
Yuhao Ou
Zhibin Xu
Publication venue: Frontiers Media S.A.
Publication date: 01/02/2024
Field of study

Grain hardness (Gh) is important for wheat processing and end-product quality. Puroindolines polymorphism explains over 60% of Gh variation and the novel genetic factors remain to be exploited. In this study, a total of 153 quantitative trait loci (QTLs), clustered into 12 genomic intervals (C1-C12), for 13 quality-related traits were identified using a recombinant inbred line population derived from the cross of Zhongkemai138 (ZKM138) and Chuanmai44 (CM44). Among them, C7 (harboring eight QTLs for different quality-related traits) and C8 (mainly harboring QGh.cib-5D.1 for Gh) were attributed to the famous genes, Rht-D1 and Pina, respectively, indicating that the correlation of involved traits was supported by the pleotropic or linked genes. Notably, a novel major stable QTL for Gh was detected in the C12, QGh.cib-7D, with ZKM138-derived allele increasing grain hardness, which was simultaneously mapped by the BSE-Seq method. The geographic pattern and transmissibility of this locus revealed that the increasing-Gh allele is highly frequently present in 85.79% of 373 worldwide wheat varieties and presented 99.31% transmissibility in 144 ZKM138-derivatives, indicating the non-negative effect on yield performance and that its indirect passive selection has happened during the actual breeding process. Thus, the contribution of this new Gh-related locus was highlighted in consideration of improving the efficiency and accuracy of the soft/hard material selection in the molecular marker-assisted process. Further, TraesCS7D02G099400, TraesCS7D02G098000, and TraesCS7D02G099500 were initially deduced to be the most potential candidate genes of QGh.cib-7D. Collectively, this study provided valuable information of elucidating the genetic architecture of Gh for wheat quality improvement

Directory of Open Access Journals

Deep Learning-Enabled Fully Automated Pipeline System for Segmentation and Classification of Single-Mass Breast Lesions Using Contrast-Enhanced Mammography: A Prospective, Multicentre Study

Author: Chu Tongpeng
Gao Jing
Gu Yajia
Li Xianglin
Li Ziyin
Lin Fan
Ma Heng
Mao Ning
Wang Simin
Xie Haizhu
Xu Cong
Zhang Haicheng
Zhang Shijie
Zhao Feng
Zheng Tiantian
Publication venue: DigitalCommons@TMC
Publication date: 01/04/2023
Field of study

Background Breast cancer is the leading cause of cancer-related deaths in women. However, accurate diagnosis of breast cancer using medical images heavily relies on the experience of radiologists. This study aimed to develop an artificial intelligence model that diagnosed single-mass breast lesions on contrast-enhanced mammography (CEM) for assisting the diagnostic workflow. Methods A total of 1912 women with single-mass breast lesions on CEM images before biopsy or surgery were included from June 2017 to October 2022 at three centres in China. Samples were divided into training and validation sets, internal testing set, pooled external testing set, and prospective testing set. A fully automated pipeline system (FAPS) using RefineNet and the Xception + Pyramid pooling module (PPM) was developed to perform the segmentation and classification of breast lesions. The performances of six radiologists and adjustments in Breast Imaging Reporting and Data System (BI-RADS) category 4 under the FAPS-assisted strategy were explored in pooled external and prospective testing sets. The segmentation performance was assessed using the Dice similarity coefficient (DSC), and the classification was assessed using heatmaps, area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. The radiologists’ reading time was recorded for comparison with the FAPS. This trial is registered with China Clinical Trial Registration Centre (ChiCTR2200063444). Findings The FAPS-based segmentation task achieved DSCs of 0.888 ± 0.101, 0.820 ± 0.148 and 0.837 ± 0.132 in the internal, pooled external and prospective testing sets, respectively. For the classification task, the FAPS achieved AUCs of 0.947 (95% confidence interval [CI]: 0.916–0.978), 0.940 (95% [CI]: 0.894–0.987) and 0.891 (95% [CI]: 0.816–0.945). It outperformed radiologists in terms of classification efficiency based on single lesions (6 s vs 3 min). Moreover, the FAPS-assisted strategy improved the performance of radiologists. BI-RADS category 4 in 12.4% and 13.3% of patients was adjusted in two testing sets with the assistance of FAPS, which may play an important guiding role in the selection of clinical management strategies. Interpretation The FAPS based on CEM demonstrated the potential for the segmentation and classification of breast lesions, and had good generalisation ability and clinical applicability. Funding This study was supported by the Taishan Scholar Foundation of Shandong Province of China (tsqn202211378), National Natural Science Foundation of China (82001775), Natural Science Foundation of Shandong Province of China (ZR2021MH120), and Special Fund for Breast Disease Research of Shandong Medical Association (YXH2021ZX055)

DigitalCommons@The Texas Medical Center

Magnetic topological insulator MnBi6Te10 with zero-field ferromagnetic state and gapped Dirac surface states

Author: Adell Johan
Ding Hong
Fan Wenhui
Fedderwitz Hanna
Fu Yang
Gao Shunye
Gong Chunsheng
Kondo Takesh
Lei Hechang
Li Hang
Nie Simin
Qian Tian
Qian Yuting
Shin Shik
Tian Shangjie
Wang Zhijun
Zhang Peng
Publication venue: 'American Physical Society (APS)'
Publication date: 01/01/2020
Field of study

Magnetic topological insulators (TIs) with nontrivial topological electronic structure and broken time-reversal symmetry exhibit various exotic topological quantum phenomena. The realization of such exotic phenomena at high temperature is one of central topics in this area. We reveal that MnBi6Te10 is a magnetic TI with an antiferromagnetic ground state below 10.8 K whose nontrivial topology is manifested by Dirac-like surface states. The ferromagnetic axion insulator state with Z4 = 2 emerges once spins polarized at field as low as 0.1 T, accompanied with saturated anomalous Hall resistivity up to 10 K. Such a ferromagnetic state is preserved even external field down to zero at 2 K. Theoretical calculations indicate that the few-layer ferromagnetic MnBi6Te10 is also topologically nontrivial with a non-zero Chern number. Angle-resolved photoemission spectroscopy experiments further reveal three types of Dirac surface states arising from different terminations on the cleavage surfaces, one of which has insulating behavior with an energy gap of ~ 28 meV at the Dirac point. These outstanding features suggest that MnBi6Te10 is a promising system to realize various topological quantum effects at zero field and high temperature.Comment: 18 pages, 4 figures and 1 tabl

arXiv.org e-Print Archive

Lund University Publications

Directional perfect absorption using deep subwavelength low-permittivity films

Author: D. M. Pozar
E. D. Palik
H. A. Haus
Igal Brener
Iltai Kim
Jeremy B. Wright
M. Born
Michael B. Sinclair
Peter B. Catrysse
Salvatore Campione
Shanhui Fan
Sheng Liu
Simin Feng
Ting S. Luk
Young Chul Jun
Publication venue: 'American Physical Society (APS)'
Publication date: 24/07/2014
Field of study

We experimentally demonstrate single beam directional perfect absorption (to within experimental accuracy) of p-polarized light in the near-infrared using unpatterned, deep subwavelength films of indium tin oxide (ITO) on Ag. The experimental perfect absorption occurs slightly above the epsilon-near-zero (ENZ) frequency of ITO, where the permittivity is less than 1 in magnitude. Remarkably, we obtain perfect absorption for films whose thickness is as low as similar to 1/50th of the operating free-space wavelength and whose single pass attenuation is only similar to 5%. We further derive simple analytical conditions for perfect absorption in the subwavelength-film regime that reveal the constraints that the thin layer permittivity must satisfy if perfect absorption is to be achieved. Then, to get a physical insight on the perfect absorption properties, we analyze the eigenmodes of the layered structure by computing both the real-frequency/complex-wavenumber and the complex-frequency/real-wavenumber modal dispersion diagrams. These analyses allow us to attribute the experimental perfect absorption condition to the crossover between bound and leaky behavior of one eigenmode of the layered structure. Both modal methods show that perfect absorption occurs at a frequency slightly larger than the ENZ frequency, in agreement with experimental results, and both methods predict a second perfect absorption condition at higher frequencies, attributed to another crossover between bound and leaky behavior of the same eigenmode. Our results greatly expand the list of materials that can be considered for use as ultrathin perfect absorbers and provide a methodology for the design of absorbing systems at any desired frequencyopen9

arXiv.org e-Print Archive

Crossref

ScholarWorks@UNIST