155 research outputs found

    BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting

    Get PDF
    The BLOOM model is a large publicly available multilingual language model, but its pretraining was limited to 46 languages. To extend the benefits of BLOOM to other languages without incurring prohibitively large costs, it is desirable to adapt BLOOM to new languages not seen during pretraining. In this work, we apply existing language adaptation strategies to BLOOM and benchmark its zero-shot prompting performance on eight new languages in a resource-constrained setting. We find language adaptation to be effective at improving zero-shot performance in new languages. Surprisingly, we find that adapter-based finetuning is more effective than continued pretraining for large models. In addition, we discover that prompting performance is not significantly affected by language specifics, such as the writing system. It is primarily determined by the size of the language adaptation data. We also add new languages to BLOOMZ, which is a multitask finetuned version of BLOOM capable of following task instructions zero-shot. We find including a new language in the multitask fine-tuning mixture to be the most effective method to teach BLOOMZ a new language. We conclude that with sufficient training data language adaptation can generalize well to diverse languages. Our code is available at https://github.com/bigscience-workshop/multilingual-modeling

    Risk scorecard to minimize impact of COVID-19 when reopening.

    Get PDF
    BACKGROUND: We present a novel approach for exiting coronavirus disease 2019 (COVID-19) lockdowns using a 'risk scorecard' to prioritize activities to resume whilst allowing safe reopening. METHODS: We modelled cases generated in the community/week, incorporating parameters for social distancing, contact tracing and imported cases. We set thresholds for cases and analysed the effect of varying parameters. An online tool to facilitate country-specific use including the modification of parameters (https://sshsphdemos.shinyapps.io/covid_riskbudget/) enables visualization of effects of parameter changes and trade-offs. Local outbreak investigation data from Singapore illustrate this. RESULTS: Setting a threshold of 0.9 mean number of secondary cases arising from a case to keep R 1. CONCLUSIONS: Countries can utilize a 'risk scorecard' to balance relaxations for travel and domestic activity depending on factors that reduce disease impact, including hospital/ICU capacity, contact tracing, quarantine and vaccination. The tool enabled visualization of the combinations of imported cases and activity levels on the case numbers and the trade-offs required. For vaccination, a reduction factor should be applied both for likelihood of an infected case being present and a close contact getting infected

    Stepwise classification of cancer samples using clinical and molecular data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Combining clinical and molecular data types may potentially improve prediction accuracy of a classifier. However, currently there is a shortage of effective and efficient statistical and bioinformatic tools for true integrative data analysis. Existing integrative classifiers have two main disadvantages: First, coarse combination may lead to subtle contributions of one data type to be overshadowed by more obvious contributions of the other. Second, the need to measure both data types for all patients may be both unpractical and (cost) inefficient.</p> <p>Results</p> <p>We introduce a novel classification method, a stepwise classifier, which takes advantage of the distinct classification power of clinical data and high-dimensional molecular data. We apply classification algorithms to two data types independently, starting with the traditional clinical risk factors. We only turn to relatively expensive molecular data when the uncertainty of prediction result from clinical data exceeds a predefined limit. Experimental results show that our approach is adaptive: the proportion of samples that needs to be re-classified using molecular data depends on how much we expect the predictive accuracy to increase when re-classifying those samples.</p> <p>Conclusions</p> <p>Our method renders a more cost-efficient classifier that is at least as good, and sometimes better, than one based on clinical or molecular data alone. Hence our approach is not just a classifier that minimizes a particular loss function. Instead, it aims to be cost-efficient by avoiding molecular tests for a potentially large subgroup of individuals; moreover, for these individuals a test result would be quickly available, which may lead to reduced waiting times (for diagnosis) and hence lower the patients distress. Stepwise classification is implemented in R-package <it>stepwiseCM </it>and available at the Bioconductor website.</p

    Impact of leg lengthening on viscoelastic properties of the deep fascia

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Despite the morphological alterations of the deep fascia subjected to leg lengthening have been investigated in cellular and extracellular aspects, the impact of leg lengthening on viscoelastic properties of the deep fascia remains largely unknown. This study aimed to address the changes of viscoelastic properties of the deep fascia during leg lengthening using uniaxial tensile test.</p> <p>Methods</p> <p>Animal model of leg lengthening was established in New Zealand white rabbits. Distraction was initiated at a rate of 1 mm/day and 2 mm/day in two steps, and preceded until increases of 10% and 20% in the initial length of tibia had been achieved. The deep fascia specimens of 30 mm × 10 mm were clamped with the Instron 1122 tensile tester at room temperature with a constant tensile rate of 5 mm/min. After 5 load-download tensile tests had been performed, the specimens were elongated until rupture. The load-displacement curves were automatically generated.</p> <p>Results</p> <p>The normal deep fascia showed typical viscoelastic rule of collagenous tissues. Each experimental group of the deep fascia after leg lengthening kept the properties. The curves of the deep fascia at a rate of 1 mm/day with 20% increase in tibia length were the closest to those of normal deep fascia. The ultimate tension strength and the strain at rupture on average of normal deep fascia were 2.69 N (8.97 mN/mm<sup>2</sup>) and 14.11%, respectively. The increases in ultimate tension strength and strain at rupture of the deep fascia after leg lengthening were statistically significant.</p> <p>Conclusion</p> <p>The deep fascia subjected to leg lengthening exhibits viscoelastic properties as collagenous tissues without lengthening other than increased strain and strength. Notwithstanding different lengthening schemes result in varied viscoelastic properties changes, the most comparable viscoelastic properties to be demonstrated are under the scheme of a distraction rate of 1 mm/day and 20% increase in tibia length.</p

    The bovine alveolar macrophage DNA methylome is resilient to infection with Mycobacterium bovis

    Get PDF
    DNA methylation is pivotal in orchestrating gene expression patterns in various mammalian biological processes. Perturbation of the bovine alveolar macrophage (bAM) transcriptome, due to Mycobacterium bovis (M. bovis) infection, has been well documented; however, the impact of this intracellular pathogen on the bAM epigenome has not been determined. Here, whole genome bisulfite sequencing (WGBS) was used to assess the effect of M. bovis infection on the bAM DNA methylome. The methylomes of bAM infected with M. bovis were compared to those of non-infected bAM 24 hours post-infection (hpi). No differences in DNA methylation (CpG or non-CpG) were observed. Analysis of DNA methylation at proximal promoter regions uncovered >250 genes harbouring intermediately methylated (IM) promoters (average methylation of 33-66%). Gene ontology analysis, focusing on genes with low, intermediate or highly methylated promoters, revealed that genes with IM promoters were enriched for immune-related GO categories; this enrichment was not observed for genes in the high or low methylation groups. Targeted analysis of genes in the IM category confirmed the WGBS observation. This study is the first in cattle examining genome-wide DNA methylation at single nucleotide resolution in an important bovine cellular host-pathogen interaction model, providing evidence for IM promoter methylation in bAM

    SUMOylation Represses Nanog Expression via Modulating Transcription Factors Oct4 and Sox2

    Get PDF
    Nanog is a pivotal transcription factor in embryonic stem (ES) cells and is essential for maintaining the pluripotency and self-renewal of ES cells. SUMOylation has been proved to regulate several stem cell markers' function, such as Oct4 and Sox2. Nanog is strictly regulated by Oct4/Sox2 heterodimer. However, the direct effects of SUMOylation on Nanog expression remain unclear. In this study, we reported that SUMOylation repressed Nanog expression. Depletion of Sumo1 or its conjugating enzyme Ubc9 increased the expression of Nanog, while high SUMOylation reduced its expression. Interestingly, we found that SUMOylation of Oct4 and Sox2 regulated Nanog in an opposing manner. SUMOylation of Oct4 enhanced Nanog expression, while SUMOylated Sox2 inhibited its expression. Moreover, SUMOylation of Oct4 by Pias2 or Sox2 by Pias3 impaired the interaction between Oct4 and Sox2. Taken together, these results indicate that SUMOylation has a negative effect on Nanog expression and provides new insights into the mechanism of SUMO modification involved in ES cells regulation

    Estimating PM 2.5 concentrations in Xi'an City using a generalized additive model with multi-source monitoring data

    Get PDF
    © 2015 Song et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Particulate matter with an aerodynamic diameter <2.5 μm (PM2.5) represents a severe environmental problem and is of negative impact on human health. Xi'an City, with a population of 6.5 million, is among the highest concentrations of PM2.5 in China. In 2013, in total, there were 191 days in Xi'an City on which PM2.5 concentrations were greater than 100 μg/m3. Recently, a few studies have explored the potential causes of high PM2.5 concentration using remote sensing data such as the MODIS aerosol optical thickness (AOT) product. Linear regression is a commonly used method to find statistical relationships among PM2.5 concentrations and other pollutants, including CO, NO2, SO2, and O3, which can be indicative of emission sources. The relationships of these variables, however, are usually complicated and non-linear. Therefore, a generalized additive model (GAM) is used to estimate the statistical relationships between potential variables and PM2.5 concentrations. This model contains linear functions of SO2 and CO, univariate smoothing non-linear functions of NO2, O3, AOT and temperature, and bivariate smoothing non-linear functions of location and wind variables. The model can explain 69.50% of PM2.5 concentrations, with R2 = 0.691, which improves the result of a stepwise linear regression (R2 = 0.582) by 18.73%. The two most significant variables, CO concentration and AOT, represent 20.65% and 19.54% of the deviance, respectively, while the three other gas-phase concentrations, SO2, NO2, and O3 account for 10.88% of the total deviance. These results show that in Xi'an City, the traffic and other industrial emissions are the primary source of PM2.5. Temperature, location, and wind variables also non-linearly related with PM2.5

    Integrated Profiling of MicroRNAs and mRNAs: MicroRNAs Located on Xq27.3 Associate with Clear Cell Renal Cell Carcinoma

    Get PDF
    Background: With the advent of second-generation sequencing, the expression of gene transcripts can be digitally measured with high accuracy. The purpose of this study was to systematically profile the expression of both mRNA and miRNA genes in clear cell renal cell carcinoma (ccRCC) using massively parallel sequencing technology. Methodology: The expression of mRNAs and miRNAs were analyzed in tumor tissues and matched normal adjacent tissues obtained from 10 ccRCC patients without distant metastases. In a prevalence screen, some of the most interesting results were validated in a large cohort of ccRCC patients. Principal Findings: A total of 404 miRNAs and 9,799 mRNAs were detected to be differentially expressed in the 10 ccRCC patients. We also identified 56 novel miRNA candidates in at least two samples. In addition to confirming that canonical cancer genes and miRNAs (including VEGFA, DUSP9 and ERBB4; miR-210, miR-184 and miR-206) play pivotal roles in ccRCC development, promising novel candidates (such as PNCK and miR-122) without previous annotation in ccRCC carcinogenesis were also discovered in this study. Pathways controlling cell fates (e. g., cell cycle and apoptosis pathways) and cell communication (e. g., focal adhesion and ECM-receptor interaction) were found to be significantly more likely to be disrupted in ccRCC. Additionally, the results of the prevalence screen revealed that the expression of a miRNA gene cluster located on Xq27.3 was consistently downregulated in at least 76.7% of similar to 50 ccRCC patients. Conclusions: Our study provided a two-dimensional map of the mRNA and miRNA expression profiles of ccRCC using deep sequencing technology. Our results indicate that the phenotypic status of ccRCC is characterized by a loss of normal renal function, downregulation of metabolic genes, and upregulation of many signal transduction genes in key pathways. Furthermore, it can be concluded that downregulation of miRNA genes clustered on Xq27.3 is associated with ccRCC

    Trends in template/fragment-free protein structure prediction

    Get PDF
    Predicting the structure of a protein from its amino acid sequence is a long-standing unsolved problem in computational biology. Its solution would be of both fundamental and practical importance as the gap between the number of known sequences and the number of experimentally solved structures widens rapidly. Currently, the most successful approaches are based on fragment/template reassembly. Lacking progress in template-free structure prediction calls for novel ideas and approaches. This article reviews trends in the development of physical and specific knowledge-based energy functions as well as sampling techniques for fragment-free structure prediction. Recent physical- and knowledge-based studies demonstrated that it is possible to sample and predict highly accurate protein structures without borrowing native fragments from known protein structures. These emerging approaches with fully flexible sampling have the potential to move the field forward
    corecore