96 research outputs found

    Skywork: A More Open Bilingual Foundation Model

    Full text link
    In this technical report, we present Skywork-13B, a family of large language models (LLMs) trained on a corpus of over 3.2 trillion tokens drawn from both English and Chinese texts. This bilingual foundation model is the most extensively trained and openly published LLMs of comparable size to date. We introduce a two-stage training methodology using a segmented corpus, targeting general purpose training and then domain-specific enhancement training, respectively. We show that our model not only excels on popular benchmarks, but also achieves \emph{state of the art} performance in Chinese language modeling on diverse domains. Furthermore, we propose a novel leakage detection method, demonstrating that test data contamination is a pressing issue warranting further investigation by the LLM community. To spur future research, we release Skywork-13B along with checkpoints obtained during intermediate stages of the training process. We are also releasing part of our SkyPile corpus, a collection of over 150 billion tokens of web text, which is the largest high quality open Chinese pre-training corpus to date. We hope Skywork-13B and our open corpus will serve as a valuable open-source resource to democratize access to high-quality LLMs

    Rare coding variants in PLCG2, ABI3, and TREM2 implicate microglial-mediated innate immunity in Alzheimer's disease

    Get PDF
    We identified rare coding variants associated with Alzheimer’s disease (AD) in a 3-stage case-control study of 85,133 subjects. In stage 1, 34,174 samples were genotyped using a whole-exome microarray. In stage 2, we tested associated variants (P<1×10-4) in 35,962 independent samples using de novo genotyping and imputed genotypes. In stage 3, an additional 14,997 samples were used to test the most significant stage 2 associations (P<5×10-8) using imputed genotypes. We observed 3 novel genome-wide significant (GWS) AD associated non-synonymous variants; a protective variant in PLCG2 (rs72824905/p.P522R, P=5.38×10-10, OR=0.68, MAFcases=0.0059, MAFcontrols=0.0093), a risk variant in ABI3 (rs616338/p.S209F, P=4.56×10-10, OR=1.43, MAFcases=0.011, MAFcontrols=0.008), and a novel GWS variant in TREM2 (rs143332484/p.R62H, P=1.55×10-14, OR=1.67, MAFcases=0.0143, MAFcontrols=0.0089), a known AD susceptibility gene. These protein-coding changes are in genes highly expressed in microglia and highlight an immune-related protein-protein interaction network enriched for previously identified AD risk genes. These genetic findings provide additional evidence that the microglia-mediated innate immune response contributes directly to AD development

    A novel Alzheimer disease locus located near the gene encoding tau protein

    Get PDF
    This is the author accepted manuscript. The final version is available from the publisher via the DOI in this recordAPOE Δ4, the most significant genetic risk factor for Alzheimer disease (AD), may mask effects of other loci. We re-analyzed genome-wide association study (GWAS) data from the International Genomics of Alzheimer's Project (IGAP) Consortium in APOE Δ4+ (10 352 cases and 9207 controls) and APOE Δ4- (7184 cases and 26 968 controls) subgroups as well as in the total sample testing for interaction between a single-nucleotide polymorphism (SNP) and APOE Δ4 status. Suggestive associations (P<1 × 10-4) in stage 1 were evaluated in an independent sample (stage 2) containing 4203 subjects (APOE Δ4+: 1250 cases and 536 controls; APOE Δ4-: 718 cases and 1699 controls). Among APOE Δ4- subjects, novel genome-wide significant (GWS) association was observed with 17 SNPs (all between KANSL1 and LRRC37A on chromosome 17 near MAPT) in a meta-analysis of the stage 1 and stage 2 data sets (best SNP, rs2732703, P=5·8 × 10-9). Conditional analysis revealed that rs2732703 accounted for association signals in the entire 100-kilobase region that includes MAPT. Except for previously identified AD loci showing stronger association in APOE Δ4+ subjects (CR1 and CLU) or APOE Δ4- subjects (MS4A6A/MS4A4A/MS4A6E), no other SNPs were significantly associated with AD in a specific APOE genotype subgroup. In addition, the finding in the stage 1 sample that AD risk is significantly influenced by the interaction of APOE with rs1595014 in TMEM106B (P=1·6 × 10-7) is noteworthy, because TMEM106B variants have previously been associated with risk of frontotemporal dementia. Expression quantitative trait locus analysis revealed that rs113986870, one of the GWS SNPs near rs2732703, is significantly associated with four KANSL1 probes that target transcription of the first translated exon and an untranslated exon in hippocampus (P≀1.3 × 10-8), frontal cortex (P≀1.3 × 10-9) and temporal cortex (P≀1.2 × 10-11). Rs113986870 is also strongly associated with a MAPT probe that targets transcription of alternatively spliced exon 3 in frontal cortex (P=9.2 × 10-6) and temporal cortex (P=2.6 × 10-6). Our APOE-stratified GWAS is the first to show GWS association for AD with SNPs in the chromosome 17q21.31 region. Replication of this finding in independent samples is needed to verify that SNPs in this region have significantly stronger effects on AD risk in persons lacking APOE Δ4 compared with persons carrying this allele, and if this is found to hold, further examination of this region and studies aimed at deciphering the mechanism(s) are warranted

    Application of grey-related decision-making methods to the evaluation of road performance for high-grade highway base materials

    No full text
    Paper presented at the 23rd Annual Southern African Transport Conference 12 - 15 July 2004 "Getting recognition for the importance of transport", CSIR International Convention Centre, Pretoria, South Africa. This paper introduces the grey-related decision-making methods to the evaluation of road base materials for the purpose of providing more objective and scientific basis to the selection of high-grade highway base materials and the bettering of designs.This paper was transferred from the original CD ROM created for this conference. The material on the CD ROM was published using Adobe Acrobat technology. The original CD ROM was produced by Document Transformation Technologies Postal Address: PO Box 560 Irene 0062 South Africa. Tel.: +27 12 667 2074 Fax: +27 12 667 2766 E-mail: [email protected] URL: http://www.doctech.co.z

    Robust Estimation in Linear Mixed-Effects Models Using the Multivariate t-Distribution

    No full text
    this paper we consider only confidence intervals and tests based on the normal approximation and concentrate on methods for the fixed effects fi

    Long-Term Skid Resistance Evaluation of GAC-16 Based on Accelerated Pavement Testing

    No full text
    In this paper, four antiskidding surface test sections were paved to investigate the long-term skid resistance of the improved dense-graded asphalt concrete in Guangdong Province (GAC) using diabase fine aggregate instead of limestone. Four test sections were tested by the accelerated loading equipment (MLS11, mobile load simulator). The reduction law of the long-term skid resistance of GAC-16 was analyzed based on the accelerated pavement testing results. Prediction models of the GAC-16 skid resistance were also established and verified. The evaluation indexes of the long-term skid resistance of the asphalt pavement were introduced, and the antiskidding durability of different sections was evaluated. Results show that the initial British pendulum number (BPN) and mean texture depth (MTD) of the asphalt pavement cannot completely characterize its long-term skid resistance. With increasing loading cycles, the attenuation law of the BPN and MTD of GAC-16 denotes a fast reduction during the early stage, which gradually stabilizes. The relation between the skid resistance index and accelerated loading cycles was analyzed by nonlinear fitting according to the least-squares-method principle. The attenuation law of the BPN and MTD of GAC-16 with loading cycles was in accordance with the exponential and logarithmic models, respectively. The long-term antiskidding performance of the asphalt pavement could be accurately characterized using a stable BPN, loading cycles while reaching a stable BPN, the initial MTD value, and the MTD reduction rate as the evaluation indexes of the skid resistance of asphalt pavement. Compared with limestone fine aggregate, diabase fine aggregate can improve the long-term skid resistance of the asphalt mixtures

    Expenditure projections for community home-based care services for older adults with functional decline in China

    No full text
    Abstract Introduction Difficulty in identifying the functional status of older adults creates an imbalance between the supply and demand for community home-based care. Using a multi-level functional classification system to guide care cost measurement may optimize care resources and meet diverse eldercare demands. Methods The Markov model was used to project the older population size in different functional decline (FD) statuses. The project cost and the man-hour costing method were combined to forecast the cost of community home-based care for older adults with FD. Results The projected cost of eldercare increased from 1668.623 billion yuan in 2020 to 2836.754 billion yuan in 2035. By 2035, the total cost for community-based home care for those in pathological development of FD statuses such as “viability disorder,” “acute disease,” “somatic functional disorder,” and “sub-disorder” was projected to be 1094.591 billion, 433.855 billion, 1256.236 billion, and 52.072 billion yuan, respectively, which is 1.24, 1.58, 1.78, and 0.49 times higher than the results by the man-hour costing method. Family caregiving costs are about three times those of professional caregivers. Conclusion The escalating cost of providing graded care for older adults, particularly by family caregivers, presenting a significant evidence for the need to optimize resource allocation and develop a robust human resources plan for community home-based care
    • 

    corecore