11,170 research outputs found

    Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment

    Get PDF
    High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ~0.01. The high-replicate data also allowed for strict quality control and screening of bad replicates, which can drastically affect the gene read-count distribution. RNA-seq data have been submitted to ENA archive with project ID PRJEB5348.Comment: 15 pages 6 figure

    EBF1-deficient bone marrow stroma elicits persistent changes in HSC potential

    No full text
    Crosstalk between mesenchymal stromal cells (MSCs) and hematopoietic stem cells (HSCs) is essential for hematopoietic homeostasis and lineage output. Here, we investigate how transcriptional changes in bone marrow (BM) MSCs result in long-lasting effects on HSCs. Single-cell analysis of Cxcl12-abundant reticular (CAR) cells and PDGFRα+Sca1+ (PαS) cells revealed an extensive cellular heterogeneity but uniform expression of the transcription factor gene Ebf1. Conditional deletion of Ebf1 in these MSCs altered their cellular composition, chromatin structure and gene expression profiles, including the reduced expression of adhesion-related genes. Functionally, the stromal-specific Ebf1 inactivation results in impaired adhesion of HSCs, leading to reduced quiescence and diminished myeloid output. Most notably, HSCs residing in the Ebf1-deficient niche underwent changes in their cellular composition and chromatin structure that persist in serial transplantations. Thus, genetic alterations in the BM niche lead to long-term functional changes of HSCs

    Methods for Joint Normalization and Comparison of Hi-C data

    Get PDF
    The development of chromatin conformation capture technology has opened new avenues of study into the 3D structure and function of the genome. Chromatin structure is known to influence gene regulation, and differences in structure are now emerging as a mechanism of regulation between, e.g., cell differentiation and disease vs. normal states. Hi-C sequencing technology now provides a way to study the 3D interactions of the chromatin over the whole genome. However, like all sequencing technologies, Hi-C suffers from several forms of bias stemming from both the technology and the DNA sequence itself. Several normalization methods have been developed for normalizing individual Hi-C datasets, but little work has been done on developing joint normalization methods for comparing two or more Hi-C datasets. To make full use of Hi-C data, joint normalization and statistical comparison techniques are needed to carry out experiments to identify regions where chromatin structure differs between conditions. We develop methods for the joint normalization and comparison of two Hi-C datasets, which we then extended to more complex experimental designs. Our normalization method is novel in that it makes use of the distance-dependent nature of chromatin interactions. Our modification of the Minus vs. Average (MA) plot to the Minus vs. Distance (MD) plot allows for a nonparametric data-driven normalization technique using loess smoothing. Additionally, we present a simple statistical method using Z-scores for detecting differentially interacting regions between two datasets. Our initial method was published as the Bioconductor R package HiCcompare [http://bioconductor.org/packages/HiCcompare/](http://bioconductor.org/packages/HiCcompare/). We then further extended our normalization and comparison method for use in complex Hi-C experiments with more than two datasets and optional covariates. We extended the normalization method to jointly normalize any number of Hi-C datasets by using a cyclic loess procedure on the MD plot. The cyclic loess normalization technique can remove between dataset biases efficiently and effectively even when several datasets are analyzed at one time. Our comparison method implements a generalized linear model-based approach for comparing complex Hi-C experiments, which may have more than two groups and additional covariates. The extended methods are also available as a Bioconductor R package [http://bioconductor.org/packages/multiHiCcompare/](http://bioconductor.org/packages/multiHiCcompare/). Finally, we demonstrate the use of HiCcompare and multiHiCcompare in several test cases on real data in addition to comparing them to other similar methods (https://doi.org/10.1002/cpbi.76)

    The ROS wheel: refining ROS transcriptional footprints

    Get PDF
    In the last decade, microarray studies have delivered extensive inventories of transcriptome-wide changes in messenger RNA levels provoked by various types of oxidative stress in Arabidopsis (Arabidopsis thaliana). Previous cross-study comparisons indicated how different types of reactive oxygen species (ROS) and their subcellular accumulation sites are able to reshape the transcriptome in specific manners. However, these analyses often employed simplistic statistical frameworks that are not compatible with large-scale analyses. Here, we reanalyzed a total of 79 Affymetrix ATH1 microarray studies of redox homeostasis perturbation experiments. To create hierarchy in such a high number of transcriptomic data sets, all transcriptional profiles were clustered on the overlap extent of their differentially expressed transcripts. Subsequently, meta-analysis determined a single magnitude of differential expression across studies and identified common transcriptional footprints per cluster. The resulting transcriptional footprints revealed the regulation of various metabolic pathways and gene families. The RESPIRATORY BURST OXIDASE HOMOLOG F-mediated respiratory burst had a major impact and was a converging point among several studies. Conversely, the timing of the oxidative stress response was a determining factor in shaping different transcriptome footprints. Our study emphasizes the need to interpret transcriptomic data sets in a systematic context, where initial, specific stress triggers can converge to common, aspecific transcriptional changes. We believe that these refined transcriptional footprints provide a valuable resource for assessing the involvement of ROS in biological processes in plants

    Tissue-Specific Transcriptomes Reveal Gene Expression Trajectories in Two Maturing Skin Epithelial Layers in Zebrafish Embryos.

    Get PDF
    Epithelial cells are the building blocks of many organs, including skin. The vertebrate skin initially consists of two epithelial layers, the outer periderm and inner basal cell layers, which have distinct properties, functions, and fates. The embryonic periderm ultimately disappears during development, whereas basal cells proliferate to form the mature, stratified epidermis. Although much is known about mechanisms of homeostasis in mature skin, relatively little is known about the two cell types in pre-stratification skin. To define the similarities and distinctions between periderm and basal skin epithelial cells, we purified them from zebrafish at early development stages and deeply profiled their gene expression. These analyses identified groups of genes whose tissue enrichment changed at each stage, defining gene flow dynamics of maturing vertebrate epithelia. At each of 52 and 72 hr post-fertilization (hpf), more than 60% of genes enriched in skin cells were similarly expressed in both layers, indicating that they were common epithelial genes, but many others were enriched in one layer or the other. Both expected and novel genes were enriched in periderm and basal cell layers. Genes encoding extracellular matrix, junctional, cytoskeletal, and signaling proteins were prominent among those distinguishing the two epithelial cell types. In situ hybridization and BAC transgenes confirmed our expression data and provided new tools to study zebrafish skin. Collectively, these data provide a resource for studying common and distinguishing features of maturing epithelia

    How many biological replicates are needed in an RNA-seq experiment and which differential expression tool should you use?

    Get PDF
    An RNA-seq experiment with 48 biological replicates in each of 2 conditions was performed to determine the number of biological replicates (nrn_r) required, and to identify the most effective statistical analysis tools for identifying differential gene expression (DGE). When nr=3n_r=3, seven of the nine tools evaluated give true positive rates (TPR) of only 20 to 40 percent. For high fold-change genes (∣log2(FC)∣>2|log_{2}(FC)|\gt2) the TPR is >85\gt85 percent. Two tools performed poorly; over- or under-predicting the number of differentially expressed genes. Increasing replication gives a large increase in TPR when considering all DE genes but only a small increase for high fold-change genes. Achieving a TPR >85\gt85% across all fold-changes requires nr>20n_r\gt20. For future RNA-seq experiments these results suggest nr>6n_r\gt6, rising to nr>12n_r\gt12 when identifying DGE irrespective of fold-change is important. For 6<nr<126 \lt n_r \lt 12, superior TPR makes edgeR the leading tool tested. For nr≥12n_r \ge12, minimizing false positives is more important and DESeq outperforms the other tools.Comment: 21 Pages and 4 Figures in main text. 9 Figures in Supplement attached to PDF. Revision to correct a minor error in the abstrac
    • …
    corecore