75 research outputs found

    netSmooth: Network-smoothing based imputation for single cell RNA-seq [version 3; referees: 2 approved]

    Get PDF
    Single cell RNA-seq (scRNA-seq) experiments suffer from a range of characteristic technical biases, such as dropouts (zero or near zero counts) and high variance. Current analysis methods rely on imputing missing values by various means of local averaging or regression, often amplifying biases inherent in the data. We present netSmooth, a network-diffusion based method that uses priors for the covariance structure of gene expression profiles on scRNA-seq experiments in order to smooth expression values. We demonstrate that netSmooth improves clustering results of scRNA-seq experiments from distinct cell populations, time-course experiments, and cancer genomics. We provide an R package for our method, available at: https://github.com/BIMSBbioinfo/netSmooth

    genomation: a toolkit to summarize, annotate and visualize genomic intervals

    Get PDF
    Summary: Biological insights can be obtained through computational integration of genomics data sets consisting of diverse types of information. The integration is often hampered by a large variety of existing file formats, often containing similar information, and the necessity to use complicated tools to achieve the desired results. We have built an R package, genomation, to expedite the extraction of biological information from high throughput data. The package works with a variety of genomic interval file types and enables easy summarization and annotation of high throughput data sets with given genomic annotations. Availability and implementation: The software is currently distributed under MIT artistic license and freely available at http://bioinformatics.mdc-berlin.de/genomation, and through the Bioconductor framework. Contact: [email protected], [email protected], [email protected], or [email protected]

    Transcriptional features of genomic regulatory blocks

    Get PDF
    CAGE tag mapping of transcription start sites across different human tissues shows that genomic regulatory blocks have unique features that are the likely cause of their ability to respond to regulatory inputs from very long distances

    PiGx: reproducible genomics analysis pipelines with GNU Guix

    Get PDF
    In bioinformatics, as well as other computationally intensive research fields, there is a need for workflows that can reliably produce consistent output, from known sources, independent of the software environment or configuration settings of the machine on which they are executed. Indeed, this is essential for controlled comparison between different observations and for the wider dissemination of workflows. However, providing this type of reproducibility and traceability is often complicated by the need to accommodate the myriad dependencies included in a larger body of software, each of which generally comes in various versions. Moreover, in many fields (bioinformatics being a prime example), these versions are subject to continual change due to rapidly evolving technologies, further complicating problems related to reproducibility. Here, we propose a principled approach for building analysis pipelines and managing their dependencies with GNU Guix. As a case study to demonstrate the utility of our approach, we present a set of highly reproducible pipelines called PiGx for the analysis of RNA sequencing, chromatin immunoprecipitation sequencing, bisulfite-treated DNA sequencing, and single-cell resolution RNA sequencing. All pipelines process raw experimental data and generate reports containing publication-ready plots and figures, with interactive report elements and standard observables. Users may install these highly reproducible packages and apply them to their own datasets without any special computational expertise beyond the use of the command line. We hope such a toolkit will provide immediate benefit to laboratory workers wishing to process their own datasets or bioinformaticians seeking to automate all, or parts of, their analyses. In the long term, we hope our approach to reproducibility will serve as a blueprint for reproducible workflows in other areas. Our pipelines, along with their corresponding documentation and sample reports, are available at http://bioinformatics.mdc-berlin.de/pigx Document type: Articl

    Predicting lethal courses in critically ill COVID-19 patients using a machine learning model trained on patients with non-COVID-19 viral pneumonia

    Get PDF
    In a pandemic with a novel disease, disease-specific prognosis models are available only with a delay. To bridge the critical early phase, models built for similar diseases might be applied. To test the accuracy of such a knowledge transfer, we investigated how precise lethal courses in critically ill COVID-19 patients can be predicted by a model trained on critically ill non-COVID-19 viral pneumonia patients. We trained gradient boosted decision tree models on 718 (245 deceased) non-COVID-19 viral pneumonia patients to predict individual ICU mortality and applied it to 1054 (369 deceased) COVID-19 patients. Our model showed a significantly better predictive performance (AUROC 0.86 [95% CI 0.86-0.87]) than the clinical scores APACHE2 (0.63 [95% CI 0.61-0.65]), SAPS2 (0.72 [95% CI 0.71-0.74]) and SOFA (0.76 [95% CI 0.75-0.77]), the COVID-19-specific mortality prediction models of Zhou (0.76 [95% CI 0.73-0.78]) and Wang (laboratory: 0.62 [95% CI 0.59-0.65]; clinical: 0.56 [95% CI 0.55-0.58]) and the 4C COVID-19 Mortality score (0.71 [95% CI 0.70-0.72]). We conclude that lethal courses in critically ill COVID-19 patients can be predicted by a machine learning model trained on non-COVID-19 patients. Our results suggest that in a pandemic with a novel disease, prognosis models built for similar diseases can be applied, even when the diseases differ in time courses and in rates of critical and lethal courses

    RNA polymerase II primes Polycomb-repressed developmental genes throughout terminal neuronal differentiation

    Get PDF
    Polycomb repression in mouse embryonic stem cells (ESCs) is tightly associated with promoter co-occupancy of RNA polymerase II (RNAPII) which is thought to prime genes for activation during early development. However, it is unknown whether RNAPII poising is a general feature of Polycomb repression, or is lost during differentiation. Here, we map the genome-wide occupancy of RNAPII and Polycomb from pluripotent ESCs to non-dividing functional dopaminergic neurons. We find that poised RNAPII complexes are ubiquitously present at Polycomb-repressed genes at all stages of neuronal differentiation. We observe both loss and acquisition of RNAPII and Polycomb at specific groups of genes reflecting their silencing or activation. Strikingly, RNAPII remains poised at transcription factor genes which are silenced in neurons through Polycomb repression, and have major roles in specifying other, non-neuronal lineages. We conclude that RNAPII poising is intrinsically associated with Polycomb repression throughout differentiation. Our work suggests that the tight interplay between RNAPII poising and Polycomb repression not only instructs promoter state transitions, but also may enable promoter plasticity in differentiated cells

    Functional interplay of Epstein-Barr virus oncoproteins in a mouse model of B cell lymphomagenesis.

    Get PDF
    Epstein-Barr virus (EBV) is a B cell transforming virus that causes B cell malignancies under conditions of immune suppression. EBV orchestrates B cell transformation through its latent membrane proteins (LMPs) and Epstein-Barr nuclear antigens (EBNAs). We here identify secondary mutations in mouse B cell lymphomas induced by LMP1, to predict and identify key functions of other EBV genes during transformation. We find aberrant activation of early B cell factor 1 (EBF1) to promote transformation of LMP1-expressing B cells by inhibiting their differentiation to plasma cells. EBV EBNA3A phenocopies EBF1 activities in LMP1-expressing B cells, promoting transformation while inhibiting differentiation. In cells expressing LMP1 together with LMP2A, EBNA3A only promotes lymphomagenesis when the EBNA2 target Myc is also overexpressed. Collectively, our data support a model where proproliferative activities of LMP1, LMP2A, and EBNA2 in combination with EBNA3A-mediated inhibition of terminal plasma cell differentiation critically control EBV-mediated B cell lymphomagenesis

    The conserved histone chaperone LIN-53 is required for normal lifespan and maintenance of muscle integrity in Caenorhabditis elegans.

    Get PDF
    Whether extension of lifespan provides an extended time without health deteriorations is an important issue for human aging. However, to which degree lifespan and aspects of healthspan regulation might be linked is not well understood. Chromatin factors could be involved in linking both aging aspects, as epigenetic mechanisms bridge regulation of different biological processes. The epigenetic factor LIN-53 (RBBP4/7) associates with different chromatin-regulating complexes to safeguard cell identities in Caenorhabditis elegans as well as mammals, and has a role in preventing memory loss and premature aging in humans. We show that LIN-53 interacts with the nucleosome remodeling and deacetylase (NuRD) complex in C. elegans muscles to ensure functional muscles during postembryonic development and in adults. While mutants for other NuRD members show a normal lifespan, animals lacking LIN-53 die early because LIN-53 depletion affects also the histone deacetylase complex Sin3, which is required for a normal lifespan. To determine why lin-53 and sin-3 mutants die early, we performed transcriptome and metabolomic analysis revealing that levels of the disaccharide trehalose are significantly decreased in both mutants. As trehalose is required for normal lifespan in C. elegans, lin-53 and sin-3 mutants could be rescued by either feeding with trehalose or increasing trehalose levels via the insulin/IGF1 signaling pathway. Overall, our findings suggest that LIN-53 is required for maintaining lifespan and muscle integrity through discrete chromatin regulatory mechanisms. Since both LIN-53 and its mammalian homologs safeguard cell identities, it is conceivable that its implication in lifespan regulation is also evolutionarily conserved

    PHF3 regulates neuronal gene expression through the Pol II CTD reader domain SPOC

    Get PDF
    The C-terminal domain (CTD) of the largest subunit of RNA polymerase II (Pol II) is a regulatory hub for transcription and RNA processing. Here, we identify PHD-finger protein 3 (PHF3) as a regulator of transcription and mRNA stability that docks onto Pol II CTD through its SPOC domain. We characterize SPOC as a CTD reader domain that preferentially binds two phosphorylated Serine-2 marks in adjacent CTD repeats. PHF3 drives liquid-liquid phase separation of phosphorylated Pol II, colocalizes with Pol II clusters and tracks with Pol II across the length of genes. PHF3 knock-out or SPOC deletion in human cells results in increased Pol II stalling, reduced elongation rate and an increase in mRNA stability, with marked derepression of neuronal genes. Key neuronal genes are aberrantly expressed in Phf3 knock-out mouse embryonic stem cells, resulting in impaired neuronal differentiation. Our data suggest that PHF3 acts as a prominent effector of neuronal gene regulation by bridging transcription with mRNA decay

    Dynamic regulation of the transcription initiation landscape at single nucleotide resolution during vertebrate embryogenesis

    Get PDF
    Spatiotemporal control of gene expression is central to animal development. Core promoters represent a previously unanticipated regulatory level by interacting with cis-regulatory elements and transcription initiation in different physiological and developmental contexts. Here, we provide a first and comprehensive description of the core promoter repertoire and its dynamic use during the development of a vertebrate embryo. By using cap analysis of gene expression (CAGE), we mapped transcription initiation events at single nucleotide resolution across 12 stages of zebrafish development. These CAGE-based transcriptome maps reveal genome-wide rules of core promoter usage, structure, and dynamics, key to understanding the control of gene regulation during vertebrate ontogeny. They revealed the existence of multiple classes of pervasive intra- and intergenic post-transcriptionally processed RNA products and their developmental dynamics. Among these RNAs, we report splice donor site-associated intronic RNA (sRNA) to be specific to genes of the splicing machinery. For the identification of conserved features, we compared the zebrafish data sets to the first CAGE promoter map of Tetraodon and the existing human CAGE data. We show that a number of features, such as promoter type, newly discovered promoter properties such as a specialized purine-rich initiator motif, as well as sRNAs and the genes in which they are detected, are conserved in mammalian and Tetraodon CAGE-defined promoter maps. The zebrafish developmental promoterome represents a powerful resource for studying developmental gene regulation and revealing promoter features shared across vertebrates.publishedVersio
    corecore