16 research outputs found

    Systematic analysis of transcription start sites in avian development

    Get PDF
    © 2017 Lizio et al. Cap Analysis of Gene Expression (CAGE) in combination with single-molecule sequencing technology allows precision mapping of transcription start sites (TSSs) and genome-wide capture of promoter activities in differentiated and steady state cell populations. Much less is known about whether TSS profiling can characterize diverse and non-steady state cell populations, such as the approximately 400 transitory and heterogeneous cell types that arise during ontogeny of vertebrate animals. To gain such insight, we used the chick model and performed CAGE-based TSS analysis on embryonic samples covering the full 3-week developmental period. In total, 31,863 robust TSS peaks ( > 1 tag per million [TPM]) were mapped to the latest chicken genome assembly, of which 34% to 46% were active in any given developmental stage. ZENBU, a web-based, open-source platform, was used for interactive data exploration. TSSs of genes critical for lineage differentiation could be precisely mapped and their activities tracked throughout development, suggesting that non-steady state and heterogeneous cell populations are amenable to CAGE-based transcriptional analysis. Our study also uncovered a large set of extremely stable housekeeping TSSs and many novel stage-specific ones. We furthermore demonstrated that TSS mapping could expedite motif-based promoter analysis for regulatory modules associated with stage-specific and housekeeping genes. Finally, using Brachyury as an example, we provide evidence that precise TSS mapping in combination with Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-on technology enables us, for the first time, to efficiently target endogenous avian genes for transcriptional activation. Taken together, our results represent the first report of genome-wide TSS mapping in birds and the first systematic developmental TSS analysis in any amniote species (birds and mammals). By facilitating promoter-based molecular analysis and genetic manipulation, our work also underscores the value of avian models in unravelling the complex regulatory mechanism of cell lineage specification during amniote development

    ISOpureR: an R implementation of a computational purification algorithm of mixed tumour profiles

    Get PDF
    Background Tumour samples containing distinct sub-populations of cancer and normal cells present challenges in the development of reproducible biomarkers, as these biomarkers are based on bulk signals from mixed tumour profiles. ISOpure is the only mRNA computational purification method to date that does not require a paired tumour-normal sample, provides a personalized cancer profile for each patient, and has been tested on clinical data. Replacing mixed tumour profiles with ISOpure-preprocessed cancer profiles led to better prognostic gene signatures for lung and prostate cancer. Results To simplify the integration of ISOpure into standard R-based bioinformatics analysis pipelines, the algorithm has been implemented as an R package. The ISOpureR package performs analogously to the original code in estimating the fraction of cancer cells and the patient cancer mRNA abundance profile from tumour samples in four cancer datasets. Conclusions The ISOpureR package estimates the fraction of cancer cells and personalized patient cancer mRNA abundance profile from a mixed tumour profile. This open-source R implementation enables integration into existing computational pipelines, as well as easy testing, modification and extension of the model.Prostate Cancer CanadaMovember Foundation (Grant RS2014-01

    Using mixtures of biological samples as process controls for RNA-sequencing experiments

    Get PDF
    Bland-Altman log-ratio(M) - log average (A) plots comparing gene expression in BLM-1 to BLM-2, which were mixed with a designed ratio of 1:1 brain RNA, 2:1 muscle RNA and 1:2 liver RNA. Points representing gene expression values for genes expressed at 5-fold greater levels in a specific tissue are colored based on the tissue in which they are selectively expressed. Non-tissue selective RNA are omitted for clarity. Library size normalization scales all libraries to a common total number of counts, while upper quartile normalization scales to the 75th percentile of the counts for each library. None of these normalizations accurately reflects the designed ratio of transcripts between samples. (PNG 473 kb

    Genomic comparison of DBA/2J and C57Bl/6J strains of Mus musculus and best practice of genome alignment for bioinformatics analyses

    Get PDF
    Alcohol use disorder is known to have significant genetic components that contribute to an individual’s susceptibility to the disease. Mouse models are commonly used to study the mechanisms underlying alcohol use disorder, with C57BL/6J (B6) and DBA/2J (D2) being two of the more prominently used inbred strains. Research in the Miles Laboratory has used these two strains, and genetic panels of mice derived from them, to identify potential genes associated with variance in ethanol-related behaviors using quantitative trait loci (QTL) analysis. For example, Ninein (Nin) was identified as a potential candidate gene for the anxiolytic effects of ethanol, discovered because it resides in the confidence interval for a QTL and shows mRNA expression differences between B6 and D2 mice. This differential expression was identified using counts of RNA-Seq reads that have been aligned to a reference genome, specifically the B6 reference genome. Due to the known genetic differences between the two strains, it is possible that the D2 samples could benefit from being aligned to a D2 genome instead of the B6. This would lead to better results overall due to improved read alignment and identification of novel splicing events that might be seen in D2 mice. To test this hypothesis, a dataset consisting of deep (150 million reads) sequencing of RNA from nucleus accumbens of both B6 and D2 mice was used for multiple bioinformatics analyses (differential expression, gene ontology, semantic similarity, differential exon utilization, splice site location, and alternative splicing) with both B6 aligned D2 counts and D2 aligned D2 counts. End results of each analysis were then compared for significant differences in outcomes. The results of this analysis show that when aligning D2 samples to the D2 genome a majority of differentially expressed genes and differentially utilized exons are retained from the B6 aligned analysis while many new genes and exons are identified that are unique to the D2 aligned analysis

    Deconvolution of Heterogeneous Tissue Samples into Relative Presence of Macrophage Phenotype Based on Gene Expression

    Get PDF
    Macrophages, as a primary cell of the innate immune system, have a variety of phenotypes that correspond to various functions. The dysregulation of the appearance of these phenotypes can lead to symptoms seen in many diseases. Specifically, macrophage phenotype has been implicated as a potential source of sustained inflammation that prevents healing in chronic wounds. In order to design effective treatments, an understanding of the relative presence of macrophage phenotypes in tissue is necessary. Inferring the relative phenotype composition is currently challenging due to the heterogeneous nature, not only of the macrophages themselves, but also of tissue samples. They contain many different cell types, which express many of the same genes. We present here a proposed method to deconvolute those heterogeneous tissue samples into the composition of two main macrophage phenotypes. Our final model uses gene expression from gene signatures for each phenotype as input to a predictive model that infers sample composition with an average error of 14.6%, and generates predictions that strongly correlate with known compositions (r=0.905). Finally, we apply this model to understand macrophage behavior in wound tissues, using publicly available datasets to obtain expression input. The model was able to demonstrate changes in macrophage phenotype composition in the wound over time.M.S., Biomedical Engineering -- Drexel University, 201

    A mixture model for expression deconvolution from RNA-seq in heterogeneous tissues

    Get PDF
    Abstract Background RNA-seq, a next-generation sequencing based method for transcriptome analysis, is rapidly emerging as the method of choice for comprehensive transcript abundance estimation. The accuracy of RNA-seq can be highly impacted by the purity of samples. A prominent, outstanding problem in RNA-seq is how to estimate transcript abundances in heterogeneous tissues, where a sample is composed of more than one cell type and the inhomogeneity can substantially confound the transcript abundance estimation of each individual cell type. Although experimental methods have been proposed to dissect multiple distinct cell types, computationally "deconvoluting" heterogeneous tissues provides an attractive alternative, since it keeps the tissue sample as well as the subsequent molecular content yield intact. Results Here we propose a probabilistic model-based approach, Transcript Estimation from Mixed Tissue samples (TEMT), to estimate the transcript abundances of each cell type of interest from RNA-seq data of heterogeneous tissue samples. TEMT incorporates positional and sequence-specific biases, and its online EM algorithm only requires a runtime proportional to the data size and a small constant memory. We test the proposed method on both simulation data and recently released ENCODE data, and show that TEMT significantly outperforms current state-of-the-art methods that do not take tissue heterogeneity into account. Currently, TEMT only resolves the tissue heterogeneity resulting from two cell types, but it can be extended to handle tissue heterogeneity resulting from multi cell types. TEMT is written in python, and is freely available at https://github.com/uci-cbcl/TEMT. Conclusions The probabilistic model-based approach proposed here provides a new method for analyzing RNA-seq data from heterogeneous tissue samples. By applying the method to both simulation data and ENCODE data, we show that explicitly accounting for tissue heterogeneity can significantly improve the accuracy of transcript abundance estimation
    corecore