2,815 research outputs found

    Identifying differential exon splicing using linear models and correlation coefficients

    Get PDF
    Background: With the availability of the Affymetrix exon arrays a number of tools have been developed to enable the analysis. These however can be expensive or have several pre-installation requirements. This led us to develop an analysis workflow for analysing differential splicing using freely available software packages that are already being widely used for gene expression analysis. The workflow uses the packages in the standard installation of R and Bioconductor (BiocLite) to identify differential splicing. We use the splice index method with the LIMMA framework. The main drawback with this approach is that it relies on accurate estimates of gene expression from the probe-level data. Methods such as RMA and PLIER may misestimate when a large proportion of exons are spliced. We therefore present the novel concept of a gene correlation coefficient calculated using only the probeset expression pattern within a gene. We show that genes with lower correlation coefficients are likely to be differentially spliced.Results: The LIMMA approach was used to identify several tissue-specific transcripts and splicing events that are supported by previous experimental studies. Filtering the data is necessary, particularly removing exons and genes that are not expressed in all samples and cross-hybridising probesets, in order to reduce the false positive rate. The LIMMA approach ranked genes containing single or few differentially spliced exons much higher than genes containing several differentially spliced exons. On the other hand we found the gene correlation coefficient approach better for identifying genes with a large number of differentially spliced exons.Conclusion: We show that LIMMA can be used to identify differential exon splicing from Affymetrix exon array data. Though further work would be necessary to develop the use of correlation coefficients into a complete analysis approach, the preliminary results demonstrate their usefulness for identifying differentially spliced genes. The two approaches work complementary as they can potentially identify different subsets of genes (single/few spliced exons vs. large transcript structure differences)

    Modeling and analysis of RNA-seq data: a review from a statistical perspective

    Full text link
    Background: Since the invention of next-generation RNA sequencing (RNA-seq) technologies, they have become a powerful tool to study the presence and quantity of RNA molecules in biological samples and have revolutionized transcriptomic studies. The analysis of RNA-seq data at four different levels (samples, genes, transcripts, and exons) involve multiple statistical and computational questions, some of which remain challenging up to date. Results: We review RNA-seq analysis tools at the sample, gene, transcript, and exon levels from a statistical perspective. We also highlight the biological and statistical questions of most practical considerations. Conclusion: The development of statistical and computational methods for analyzing RNA- seq data has made significant advances in the past decade. However, methods developed to answer the same biological question often rely on diverse statical models and exhibit different performance under different scenarios. This review discusses and compares multiple commonly used statistical models regarding their assumptions, in the hope of helping users select appropriate methods as needed, as well as assisting developers for future method development

    De novo prediction of PTBP1 binding and splicing targets reveals unexpected features of its RNA recognition and function.

    Get PDF
    The splicing regulator Polypyrimidine Tract Binding Protein (PTBP1) has four RNA binding domains that each binds a short pyrimidine element, allowing recognition of diverse pyrimidine-rich sequences. This variation makes it difficult to evaluate PTBP1 binding to particular sites based on sequence alone and thus to identify target RNAs. Conversely, transcriptome-wide binding assays such as CLIP identify many in vivo targets, but do not provide a quantitative assessment of binding and are informative only for the cells where the analysis is performed. A general method of predicting PTBP1 binding and possible targets in any cell type is needed. We developed computational models that predict the binding and splicing targets of PTBP1. A Hidden Markov Model (HMM), trained on CLIP-seq data, was used to score probable PTBP1 binding sites. Scores from this model are highly correlated (ρ = -0.9) with experimentally determined dissociation constants. Notably, we find that the protein is not strictly pyrimidine specific, as interspersed Guanosine residues are well tolerated within PTBP1 binding sites. This model identifies many previously unrecognized PTBP1 binding sites, and can score PTBP1 binding across the transcriptome in the absence of CLIP data. Using this model to examine the placement of PTBP1 binding sites in controlling splicing, we trained a multinomial logistic model on sets of PTBP1 regulated and unregulated exons. Applying this model to rank exons across the mouse transcriptome identifies known PTBP1 targets and many new exons that were confirmed as PTBP1-repressed by RT-PCR and RNA-seq after PTBP1 depletion. We find that PTBP1 dependent exons are diverse in structure and do not all fit previous descriptions of the placement of PTBP1 binding sites. Our study uncovers new features of RNA recognition and splicing regulation by PTBP1. This approach can be applied to other multi-RRM domain proteins to assess binding site degeneracy and multifactorial splicing regulation

    Multi-omics analysis of early molecular mechanisms of type 1 diabetes

    Get PDF
    Type 1 diabetes (T1D) is a complicated autoimmune disease with largely unknown disease mechanisms. The diagnosis is preceded by a long asymptomatic period of autoimmune activity in the insulin-producing pancreatic islets. Currently the only clinical markers used for T1D prediction are islet autoantibodies, which are a sign of already-broken immune tolerance. The focus of this dissertation is on the early asymptomatic period preceding seroconversion to islet autoantibody positivity. The genetic risk of type 1 diabetes has been thoroughly mapped in genome-wide association studies, but environmental factors and molecular mechanisms that mediate the risk are less well understood. According to the hygiene hypothesis, the risk of immune-mediated disorders is increased by the lack of exposure to pathogens in modern environments. Within a study on the hygiene hypothesis, we compared umbilical cord blood gene expression patterns between children born in environments with contrasting standards of living and type 1 diabetes incidences (Finland, Russia, and Estonia). The differentially expressed genes were associated with innate immunity and immune maturation. Our results suggest that the environment influences the immune system development already in-utero. Furthermore, we analyzed genome-wide DNA methylation and gene expression profiles in samples collected prospectively from Finnish children and newborn infants at risk of type 1 diabetes. Bisulfite sequencing analysis did not show any association of neonatal DNA methylation with later progression to T1D. However, antiviral type I interferon response in early childhood was found to be a risk factor of T1D. This transcriptomic signature was detectable in the peripheral blood already before islet autoantibodies, and the main observations were confirmed in an independent German study. These results contributed to the hypothesis that virus infections might play a role in T1D. Additionally, this dissertation contributed to transcriptomic and epigenomic data analysis workflows. Simple probe-level analysis of exon array data was shown to improve the reproducibility, specificity, and sensitivity of detected differential exon inclusion events. Type 1 error rate was markedly reduced by permutation-based significance assessment of differential methylation in bisulfite sequencing studies.Tyypin 1 diabeteksen varhaisten molekulaaristen mekanismien multiomiikka-analyysi Tyypin 1 diabetes (T1D) on autoimmuunitauti, jonka taustalla olevista mekanismeista tiedetään vähän. Diagnoosia edeltää pitkä oireeton jakso, jonka aikana insuliinia tuottaviin beetasoluihin kohdistuva autoimmuunireaktio etenee haiman saarekkeissa. Tämä väitöskirjatutkimus keskittyy T1D:n varhaiseen oireettomaan ajanjaksoon, joka edeltää serokonversiota autovasta-ainepositiiviseksi. Tyypin 1 diabeteksen geneettiset riskitekijät on kartoitettu perusteellisesti genominlaajuisissa assosiaatiotutkimuksissa, mutta ympäristön riskitekijöistä ja riskiä välittävistä molekyylimekanismeista tiedetään vähemmän. Hygieniahypoteesin mukaan vähäinen altistuminen taudinaiheuttajille lisää immuunijärjestelmän häiriöiden riskiä. Hygieniahypoteesiin liittyvässä osatyössä vertasimme hygienian ja T1D:n ilmaantuvuuden suhteen erilaisissa ympäristöissä (Suomi, Venäjä ja Viro) syntyneiden lasten napaveren geeniekpressioprofiileja. Erilaisesti ekspressoituneet geenit liittyivät synnynnäiseen immuniteettiin ja immuunijärjestelmän maturaatioon. Näiden tulosten perusteella ympäristö saattaa vaikuttaa immuunijärjestelmän kehitykseen jo raskauden aikana. Genominlaajuista DNA-metylaatiota ja geeniekspressiota analysoitiin näytteistä, jotka oli kerätty laajassa suomalaisessa seurantatutkimuksessa T1D:n riskiryhmään kuuluvilta lapsilta ja vastasyntyneiltä. Bisulfiittisekvensointianalyysin perusteella vastasyntyneen DNA-metylaation ja lapsuuden aikana kehittyvän T1D:n välillä ei ollut yhteyttä. Sen sijaan RNA:n tasolla havaittava viruksiin kohdistuva tyypin 1 interferonivaste varhaislapsuudessa todettiin T1D:n riskitekijäksi. Tämä havainto tehtiin perifeerisestä verestä jo ennen saarekevasta-aineiden ilmaantumista, ja päähavainnot vahvistettiin saksalaisessa tutkimuksessa. Nämä tulokset vahvistivat hypoteesia, jonka mukaan virukset voivat vaikuttaa T1D:n puhkeamiseen. T1D-tutkimuksen ohella tämä väitöskirjatyö kehitti transkriptomiikkaan ja epigenomiikkaan sopivia analyysimenetelmiä. Eksonimikrosirujen koetintasoisen analyysin todettiin parantavan toistettavuutta, sensitiivisyyttä ja tarkkuutta vaihtoehtoisen silmukoinniin kartoittamisessa. Tilastollisen merkitsevyyden permutaatiopohjainen analyysi vähensi tyypin 1 virhettä bisulfiittisekvensointidatan analyysissa

    MMpred: functional miRNA – mRNA interaction analyses by miRNA expression prediction

    Get PDF
    Background: MicroRNA (miRNA) directed gene repression is an important mechanism of posttranscriptional regulation. Comprehensive analyses of how microRNA influence biological processes requires paired miRNA-mRNA expression datasets. However, a review of both GEO and ArrayExpress repositories revealed few such datasets, which was in stark contrast to the large number of messenger RNA (mRNA) only datasets. It is of interest that numerous primary miRNAs (precursors of microRNA) are known to be co-expressed with coding genes (host genes). Results: We developed a miRNA-mRNA interaction analyses pipeline. The proposed solution is based on two miRNA expression prediction methods – a scaling function and a linear model. Additionally, miRNA-mRNA anticorrelation analyses are used to determine the most probable miRNA gene targets (i.e. the differentially expressed genes under the influence of up- or down-regulated microRNA). Both the consistency and accuracy of the prediction method is ensured by the application of stringent statistical methods. Finally, the predicted targets are subjected to functional enrichment analyses including GO, KEGG and DO, to better understand the predicted interactions. Conclusions: The MMpred pipeline requires only mRNA expression data as input and is independent of third party miRNA target prediction methods. The method passed extensive numerical validation based on the binding energy between the mature miRNA and 3’ UTR region of the target gene. We report that MMpred is capable of generating results similar to that obtained using paired datasets. For the reported test cases we generated consistent output and predicted biological relationships that will help formulate further testable hypotheses

    Different effects of the probe summarization algorithms PLIER and RMA on high-level analysis of Affymetrix exon arrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Alternative splicing is an important mechanism that increases protein diversity and functionality in higher eukaryotes. Affymetrix exon arrays are a commercialized platform used to detect alternative splicing on a genome-wide scale. Two probe summarization algorithms, PLIER (Probe Logarithmic Intensity Error) and RMA (Robust Multichip Average), are commonly used to compute gene-level and exon-level expression values. However, a systematic comparison of these two algorithms on their effects on high-level analysis of the arrays has not yet been reported.</p> <p>Results</p> <p>In this study, we showed that PLIER summarization led to over-estimation of gene-level expression changes, relative to exon-level expression changes, in two-group comparisons. Consequently, it led to detection of substantially more skipped exons on up-regulated genes, as well as substantially more included (i.e., non-skipped) exons on down-regulated genes. In contrast, this bias was not observed for RMA-summarized data. By using a published human tissue dataset, we compared the tissue-specific expression and splicing detected by Affymetrix exon arrays with those detected based on expressed sequence databases. We found the tendency of PLIER was not supported by the expressed sequence data.</p> <p>Conclusion</p> <p>We showed that the tendency of PLIER in detection of alternative splicing is likely caused by a technical bias in the approach, rather than a biological bias. Moreover, we observed abnormal summarization results when using the PLIER algorithm, indicating that mathematical problems, such as numerical instability, may affect PLIER performance.</p

    HBA-DEALS: accurate and simultaneous identification of differential expression and splicing using hierarchical Bayesian analysis.

    Get PDF
    We present Hierarchical Bayesian Analysis of Differential Expression and ALternative Splicing (HBA-DEALS), which simultaneously characterizes differential expression and splicing in cohorts. HBA-DEALS attains state of the art or better performance for both expression and splicing and allows genes to be characterized as having differential gene expression, differential alternative splicing, both, or neither. HBA-DEALS analysis of GTEx data demonstrated sets of genes that show predominant DGE or DAST across multiple tissue types. These sets have pervasive differences with respect to gene structure, function, membership in protein complexes, and promoter architecture

    Radiation-induced alternative transcription and splicing events and their applicability to practical biodosimetry

    Get PDF
    Accurate assessment of the individual exposure dose based on easily accessible samples (e.g. blood) immediately following a radiological accident is crucial. We aimed at developing a robust transcription-based signature for biodosimetry from human peripheral blood mononuclear cells irradiated with different doses of X-rays (0.1 and 1.0 Gy) at a dose rate of 0.26 Gy/min. Genome-wide radiation-induced changes in mRNA expression were evaluated at both gene and exon level. Using exon-specific qRT-PCR, we confirmed that several biomarker genes are alternatively spliced or transcribed after irradiation and that different exons of these genes exhibit significantly different levels of induction. Moreover, a significant number of radiation-responsive genes were found to be genomic neighbors. Using three different classification models we found that gene and exon signatures performed equally well on dose prediction, as long as more than 10 features are included. Together, our results highlight the necessity of evaluating gene expression at the level of single exons for radiation biodosimetry in particular and transcriptional biomarker research in general. This approach is especially advisable for practical gene expression-based biodosimetry, for which primer-or probe-based techniques would be the method of choice

    Functional Epigenomics In Insects Using Next-Generation Sequencing Methods

    Get PDF
    DNA methylation is a widespread epigenetic modification implicated in many important processes such as development, disease, and genomic imprinting. In well-studied mammalian systems, DNA methylation at gene promoters acts as a transcriptional repressor including playing a critical role in X chromosome inactivation. Despite the importance and prevalence of DNA methylation, essential model organisms such as D. melanogaster and C. elegans have experienced lineage-specific losses of genomic DNA methylation. This thesis focuses on a comprehensive epigenomics survey and investigation of the Hymenopteran insect order, a group of insects including wasps, bees and ants that have retained functional DNA methylation systems. This diverse group of insects allows us to gain new insights in to the function role of DNA methylation, especially in the context of gene expression regulation. I will first provide a general survey of the epigenetic landscape of insects, which have a completely different pattern compared to mammals, and offer a new approach to quantifying and analyzing DNA methylation in these organisms. Next, I investigate changes to DNA methylation and gene expression that accompany a bacterial infection and a drastic shift from sexual to asexual reproduction in a parasitoid wasp. I will then examine how the intricate honey bee society gives rise to allele-specific methylation and its potential relationship to allele-specific expression. Finally, I explore the importance of DNA methylation along with other promoter elements in regulating gene expression variation.Ph.D
    corecore