22,486 research outputs found

    Transcriptome Analyses of Tumor-Adjacent Somatic Tissues Reveal Genes Co-Expressed with Transposable Elements

    Get PDF
    Background: Despite the long-held assumption that transposons are normally only expressed in the germ-line, recent evidence shows that transcripts of transposable element (TE) sequences are frequently found in the somatic cells. However, the extent of variation in TE transcript levels across different tissues and different individuals are unknown, and the co-expression between TEs and host gene mRNAs have not been examined. Results: Here we report the variation in TE derived transcript levels across tissues and between individuals observed in the non-tumorous tissues collected for The Cancer Genome Atlas. We found core TE co-expression modules consisting mainly of transposons, showing correlated expression across broad classes of TEs. Despite this co-expression within tissues, there are individual TE loci that exhibit tissue-specific expression patterns, when compared across tissues. The core TE modules were negatively correlated with other gene modules that consisted of immune response genes in interferon signaling. KRAB Zinc Finger Proteins (KZFPs) were over-represented gene members of the TE modules, showing positive correlation across multiple tissues. But we did not find overlap between TE-KZFP pairs that are co-expressed and TE-KZFP pairs that are bound in published ChIP-seq studies. Conclusions: We find unexpected variation in TE derived transcripts, within and across non-tumorous tissues. We describe a broad view of the RNA state for non-tumorous tissues exhibiting higher level of TE transcripts. Tissues with higher level of TE transcripts have a broad range of TEs co-expressed, with high expression of a large number of KZFPs, and lower RNA levels of immune genes

    Global isoform-specific transcript alterations and deregulated networks in clear cell renal cell carcinoma.

    Get PDF
    Extensive genome-wide analyses of deregulated gene expression have now been performed for many types of cancer. However, most studies have focused on deregulation at the gene-level, which may overlook the alterations of specific transcripts for a given gene. Clear cell renal cell carcinoma (ccRCC) is one of the best-characterized and most pervasive renal cancers, and ccRCCs are well-documented to have aberrant RNA processing. In the present study, we examine the extent of aberrant isoform-specific RNA expression by reporting a comprehensive transcript-level analysis, using the new kallisto-sleuth-RATs pipeline, investigating coding and non-coding differential transcript expression in ccRCC. We analyzed 50 ccRCC tumors and their matched normal samples from The Cancer Genome Altas datasets. We identified 7,339 differentially expressed transcripts and 94 genes exhibiting differential transcript isoform usage in ccRCC. Additionally, transcript-level coexpression network analyses identified vasculature development and the tricarboxylic acid cycle as the most significantly deregulated networks correlating with ccRCC progression. These analyses uncovered several uncharacterized transcripts, including lncRNAs FGD5-AS1 and AL035661.1, as potential regulators of the tricarboxylic acid cycle associated with ccRCC progression. As ccRCC still presents treatment challenges, our results provide a new resource of potential therapeutics targets and highlight the importance of exploring alternative methodologies in transcriptome-wide studies

    FreePSI: an alignment-free approach to estimating exon-inclusion ratios without a reference transcriptome.

    Get PDF
    Alternative splicing plays an important role in many cellular processes of eukaryotic organisms. The exon-inclusion ratio, also known as percent spliced in, is often regarded as one of the most effective measures of alternative splicing events. The existing methods for estimating exon-inclusion ratios at the genome scale all require the existence of a reference transcriptome. In this paper, we propose an alignment-free method, FreePSI, to perform genome-wide estimation of exon-inclusion ratios from RNA-Seq data without relying on the guidance of a reference transcriptome. It uses a novel probabilistic generative model based on k-mer profiles to quantify the exon-inclusion ratios at the genome scale and an efficient expectation-maximization algorithm based on a divide-and-conquer strategy and ultrafast conjugate gradient projection descent method to solve the model. We compare FreePSI with the existing methods on simulated and real RNA-seq data in terms of both accuracy and efficiency and show that it is able to achieve very good performance even though a reference transcriptome is not provided. Our results suggest that FreePSI may have important applications in performing alternative splicing analysis for organisms that do not have quality reference transcriptomes. FreePSI is implemented in C++ and freely available to the public on GitHub

    MSIQ: Joint Modeling of Multiple RNA-seq Samples for Accurate Isoform Quantification

    Full text link
    Next-generation RNA sequencing (RNA-seq) technology has been widely used to assess full-length RNA isoform abundance in a high-throughput manner. RNA-seq data offer insight into gene expression levels and transcriptome structures, enabling us to better understand the regulation of gene expression and fundamental biological processes. Accurate isoform quantification from RNA-seq data is challenging due to the information loss in sequencing experiments. A recent accumulation of multiple RNA-seq data sets from the same tissue or cell type provides new opportunities to improve the accuracy of isoform quantification. However, existing statistical or computational methods for multiple RNA-seq samples either pool the samples into one sample or assign equal weights to the samples when estimating isoform abundance. These methods ignore the possible heterogeneity in the quality of different samples and could result in biased and unrobust estimates. In this article, we develop a method, which we call "joint modeling of multiple RNA-seq samples for accurate isoform quantification" (MSIQ), for more accurate and robust isoform quantification by integrating multiple RNA-seq samples under a Bayesian framework. Our method aims to (1) identify a consistent group of samples with homogeneous quality and (2) improve isoform quantification accuracy by jointly modeling multiple RNA-seq samples by allowing for higher weights on the consistent group. We show that MSIQ provides a consistent estimator of isoform abundance, and we demonstrate the accuracy and effectiveness of MSIQ compared with alternative methods through simulation studies on D. melanogaster genes. We justify MSIQ's advantages over existing approaches via application studies on real RNA-seq data from human embryonic stem cells, brain tissues, and the HepG2 immortalized cell line