64 research outputs found

    Dynamic Masking Rate Schedules for MLM Pretraining

    Full text link
    Most works on transformers trained with the Masked Language Modeling (MLM) objective use the original BERT model's fixed masking rate of 15%. Our work instead dynamically schedules the masking ratio throughout training. We found that linearly decreasing the masking rate from 30% to 15% over the course of pretraining improves average GLUE accuracy by 0.46% in BERT-base, compared to a standard 15% fixed rate. Further analyses demonstrate that the gains from scheduling come from being exposed to both high and low masking rate regimes. Our results demonstrate that masking rate scheduling is a simple way to improve the quality of masked language models and achieve up to a 1.89x speedup in pretraining

    Multiple, distinct intercontinental lineages but isolation of Australian populations in a cosmopolitan lichen-forming Fungal Taxon, Psora decipiens (Psoraceae, Ascomycota)

    Get PDF
    Multiple drivers shape the spatial distribution of species, including dispersal capacity, niche incumbency, climate variability, orographic barriers, and plate tectonics. However, biogeographic patterns of fungi commonly do not fit conventional expectations based on studies of animals and plants. Fungi, in general, are known to occur across exceedingly broad, intercontinental distributions, including some important components of biological soil crust communities (BSCs). However, molecular data often reveal unexpected biogeographic patterns in lichenized fungal species that are assumed to have cosmopolitan distributions. The lichen-forming fungal species Psora decipiens is found on all continents, except Antarctica and occurs in BSCs across diverse habitats, ranging from hot, arid deserts to alpine habitats. In order to better understand factors that shape population structure in cosmopolitan lichen-forming fungal species, we investigated biogeographic patterns in the cosmopolitan taxon P. decipiens, along with the closely related taxa P. crenata and P. saviczii. We generated a multi-locus sequence dataset based on a worldwide sampling of these taxa in order to reconstruct evolutionary relationships and explore phylogeographic patterns. Both P. crenata and P. decipiens were not recovered as monophyletic; and P. saviczii specimens were recovered as a monophyletic clade closely related to a number of lineages comprised of specimens representing P. decipiens. Striking phylogeographic patterns were observed for P. crenata, with populations from distinct geographic regions belonging to well-separated, monophyletic lineages. South African populations of P. crenata were further divided into well-supported sub-clades. While well-supported phylogenetic substructure was also observed for the nominal taxon P. decipiens, nearly all lineages were comprised of specimens collected from intercontinental populations. However, all Australian specimens representing P. decipiens were recovered within a single well-supported monophyletic clade consisting solely of Australian samples. Our study supports up to 10 candidate species-level lineages in P. decipiens, based on genealogical concordance and coalescent-based species delimitation analyses. Our results support the general pattern of the biogeographic isolation of lichen-forming fungal populations in Australia, even in cases where closely related congeners have documented intercontinental distributions. Our study has important implications for understanding factors influencing diversification and distributions of lichens associated with BSC.This research was funded, in part, by a start-up grant from BYU College of Life Sciences to SL; MarW’s and MatW’s work was done within the European Soil Crust Project SCIN (Büdel et al., 2014) funded by the ERA-Net BiodivERsA program, with the national funder The Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning (FORMAS)

    Creative Thinking and Modelling for the Decision Support in Water Management

    Full text link

    Identification of genetic variants associated with Huntington's disease progression: a genome-wide association study

    Get PDF
    Background Huntington's disease is caused by a CAG repeat expansion in the huntingtin gene, HTT. Age at onset has been used as a quantitative phenotype in genetic analysis looking for Huntington's disease modifiers, but is hard to define and not always available. Therefore, we aimed to generate a novel measure of disease progression and to identify genetic markers associated with this progression measure. Methods We generated a progression score on the basis of principal component analysis of prospectively acquired longitudinal changes in motor, cognitive, and imaging measures in the 218 indivduals in the TRACK-HD cohort of Huntington's disease gene mutation carriers (data collected 2008–11). We generated a parallel progression score using data from 1773 previously genotyped participants from the European Huntington's Disease Network REGISTRY study of Huntington's disease mutation carriers (data collected 2003–13). We did a genome-wide association analyses in terms of progression for 216 TRACK-HD participants and 1773 REGISTRY participants, then a meta-analysis of these results was undertaken. Findings Longitudinal motor, cognitive, and imaging scores were correlated with each other in TRACK-HD participants, justifying use of a single, cross-domain measure of disease progression in both studies. The TRACK-HD and REGISTRY progression measures were correlated with each other (r=0·674), and with age at onset (TRACK-HD, r=0·315; REGISTRY, r=0·234). The meta-analysis of progression in TRACK-HD and REGISTRY gave a genome-wide significant signal (p=1·12 × 10−10) on chromosome 5 spanning three genes: MSH3, DHFR, and MTRNR2L2. The genes in this locus were associated with progression in TRACK-HD (MSH3 p=2·94 × 10−8 DHFR p=8·37 × 10−7 MTRNR2L2 p=2·15 × 10−9) and to a lesser extent in REGISTRY (MSH3 p=9·36 × 10−4 DHFR p=8·45 × 10−4 MTRNR2L2 p=1·20 × 10−3). The lead single nucleotide polymorphism (SNP) in TRACK-HD (rs557874766) was genome-wide significant in the meta-analysis (p=1·58 × 10−8), and encodes an aminoacid change (Pro67Ala) in MSH3. In TRACK-HD, each copy of the minor allele at this SNP was associated with a 0·4 units per year (95% CI 0·16–0·66) reduction in the rate of change of the Unified Huntington's Disease Rating Scale (UHDRS) Total Motor Score, and a reduction of 0·12 units per year (95% CI 0·06–0·18) in the rate of change of UHDRS Total Functional Capacity score. These associations remained significant after adjusting for age of onset. Interpretation The multidomain progression measure in TRACK-HD was associated with a functional variant that was genome-wide significant in our meta-analysis. The association in only 216 participants implies that the progression measure is a sensitive reflection of disease burden, that the effect size at this locus is large, or both. Knockout of Msh3 reduces somatic expansion in Huntington's disease mouse models, suggesting this mechanism as an area for future therapeutic investigation
    corecore