12 research outputs found

    Computational tools for the study of RNA processing and function

    Get PDF
    El processament de les cadenes d’àcids ribonucleics (ARN) és un mecanisme mol·lecular crucial gràcies al qual els precursors delsARN missatgers es converteixen en ARN missatgers madurs. L’exemple més notable és l’anomenat empalmament, procés en el qual els introns són eliminats del precursor, i que sovint origina formes alternatives d’ARN missatgers madurs. Els ARN no codificants, o ARN que no tenen la capacitat de ser traduïts en proteïna, també estan sotmesos a diversos passos de processament, i alguns estudis estableixen una connexió entre aquest processament i la funció que exerceixen. Addicionalment, estudis recents assenyalen els ARN no codificants com a reguladors de l’empalmament alternatiu. Tanmateix, aquests mecanismes no es coneixen en profunditat. Aquest treball inclou el desenvolupament de tres novedoses propostes centrades en (i) l’anàlisi del processament de petits ARN no codificants, (ii) l’estudi de l’empalmament alternatiu i (iii) l’estudi dels processos cel·lulars que determinen les interaccions entre aquests dos.RNA processing is a crucial molecular mechanism by which precursor RNAs are converted into mature RNAs. The most notable processing step is splicing, in which introns are removed from precursor messenger RNAs, and that often gives birth to alternative forms of mature messenger RNAs. Non-coding RNAs, or RNAs that lack the capacity to be translated into a protein, also undergo extensive RNA processing steps during their biogenesis, and several studies establish a relation between the processing of non-coding RNAs and the function they exert. Moreover, recent studies point non-coding RNAs as regulators of alternative splicing, although the regulation mechanisms are not completely understood. The present work includes the development of three novel computational approaches focused on (i) the analysis of the processing of small non-coding RNAs, (ii) the study of alternative splicing and (iii) the study of the cellular processes that guide the interplay between both of them

    Computational tools for the study of RNA processing and function

    No full text
    El processament de les cadenes d’àcids ribonucleics (ARN) és un mecanisme mol·lecular crucial gràcies al qual els precursors delsARN missatgers es converteixen en ARN missatgers madurs. L’exemple més notable és l’anomenat empalmament, procés en el qual els introns són eliminats del precursor, i que sovint origina formes alternatives d’ARN missatgers madurs. Els ARN no codificants, o ARN que no tenen la capacitat de ser traduïts en proteïna, també estan sotmesos a diversos passos de processament, i alguns estudis estableixen una connexió entre aquest processament i la funció que exerceixen. Addicionalment, estudis recents assenyalen els ARN no codificants com a reguladors de l’empalmament alternatiu. Tanmateix, aquests mecanismes no es coneixen en profunditat. Aquest treball inclou el desenvolupament de tres novedoses propostes centrades en (i) l’anàlisi del processament de petits ARN no codificants, (ii) l’estudi de l’empalmament alternatiu i (iii) l’estudi dels processos cel·lulars que determinen les interaccions entre aquests dos.RNA processing is a crucial molecular mechanism by which precursor RNAs are converted into mature RNAs. The most notable processing step is splicing, in which introns are removed from precursor messenger RNAs, and that often gives birth to alternative forms of mature messenger RNAs. Non-coding RNAs, or RNAs that lack the capacity to be translated into a protein, also undergo extensive RNA processing steps during their biogenesis, and several studies establish a relation between the processing of non-coding RNAs and the function they exert. Moreover, recent studies point non-coding RNAs as regulators of alternative splicing, although the regulation mechanisms are not completely understood. The present work includes the development of three novel computational approaches focused on (i) the analysis of the processing of small non-coding RNAs, (ii) the study of alternative splicing and (iii) the study of the cellular processes that guide the interplay between both of them

    Predictive models of gene regulation from high-throughput epigenomics data

    No full text
    The epigenetic regulation of gene expression involves multiple factors. The synergistic or antagonistic action of these factors has suggested the existence of an epigenetic code for gene regulation. Highthroughput sequencing (HTS) provides an opportunity to explore this code and to build quantitative models of gene regulation based on epigenetic differences between specific cellular conditions. We describe a new computational framework that facilitates the systematic integration of HTS epigenetic data. Our method relates epigenetic signals to expression by comparing two conditions. We show its effectiveness by building a model that predicts with high accuracy significant expression differences between two cell lines, using epigenetic data from the ENCODE project. Our analyses provide evidence for a degenerate epigenetic code, which involves multiple genic regions. In particular, signal changes at the 1st exon, 1st intron, and downstream of the polyadenylation site are found to associate strongly with expression regulation. Our analyses also show a different epigenetic code for intron-less and intron-containing genes. Our work provides a general methodology to do integrative analysis of epigenetic differences between cellular conditions that can be applied to other studies, like cell differentiation or carcinogenesis.This work was supported by Grants BIO2011-23920 and CSD2009-00080 from the Spanish Ministry of/nScience and by the Sandra Ibarra Foundation. S. Althammer was supported by an FI grant from the Generalitat de Cataluny

    Predictive models of gene regulation from high-throughput epigenomics data

    Get PDF
    The epigenetic regulation of gene expression involves multiple factors. The synergistic or antagonistic action of these factors has suggested the existence of an epigenetic code for gene regulation. Highthroughput sequencing (HTS) provides an opportunity to explore this code and to build quantitative models of gene regulation based on epigenetic differences between specific cellular conditions. We describe a new computational framework that facilitates the systematic integration of HTS epigenetic data. Our method relates epigenetic signals to expression by comparing two conditions. We show its effectiveness by building a model that predicts with high accuracy significant expression differences between two cell lines, using epigenetic data from the ENCODE project. Our analyses provide evidence for a degenerate epigenetic code, which involves multiple genic regions. In particular, signal changes at the 1st exon, 1st intron, and downstream of the polyadenylation site are found to associate strongly with expression regulation. Our analyses also show a different epigenetic code for intron-less and intron-containing genes. Our work provides a general methodology to do integrative analysis of epigenetic differences between cellular conditions that can be applied to other studies, like cell differentiation or carcinogenesis.This work was supported by Grants BIO2011-23920 and CSD2009-00080 from the Spanish Ministry of/nScience and by the Sandra Ibarra Foundation. S. Althammer was supported by an FI grant from the Generalitat de Cataluny

    The prognostic potential of alternative transcript isoforms across human tumors

    No full text
    Background: Phenotypic changes during cancer progression are associated with alterations in gene expression, which can be exploited to build molecular signatures for tumor stage identification and prognosis. However, it is not yet known whether the relative abundance of transcript isoforms may be informative for clinical stage and survival. Methods: Using information theory and machine learning methods, we integrated RNA sequencing and clinical data from The Cancer Genome Atlas project to perform the first systematic analysis of the prognostic potential of transcript isoforms in 12 solid tumors to build new signatures for stage and prognosis. This study was also performed in breast tumors according to estrogen receptor (ER) status and melanoma tumors with proliferative and invasive phenotypes. Results: Transcript isoform signatures accurately separate early from late-stage groups and metastatic from non-metastatic tumors, and are predictive of the survival of patients with undetermined lymph node invasion or metastatic status. These signatures show similar, and sometimes better, accuracies compared with known gene expression signatures in retrospective data and are largely independent of gene expression changes. Furthermore, we show frequent transcript isoform changes in breast tumors according to ER status, and in melanoma tumors according to the invasive or proliferative phenotype, and derive accurate predictive models of stage and survival within each patient subgroup. Conclusions: Our analyses reveal new signatures based on transcript isoform abundances that characterize tumor phenotypes and their progression independently of gene expression. Transcript isoform signatures appear especially relevant to determine lymph node invasion and metastasis and may potentially contribute towards current strategies of precision cancer medicine.This work was supported by grants BIO2014-52566-R and Consolider RNAREG (CSD2009-00080) from the MINECO (Spanish Government) and FEDER, by AGAUR (2014-SGR1121) and by the Sandra Ibarra Foundation for Cancer (FSI2013)

    A semi-supervised approach uncovers thousands of intragenic enhancers differentially activated in human cells

    No full text
    Background. Transcriptional enhancers are generally known to regulate gene transcription from afar. Their activation involves a series of changes in chromatin marks and recruitment of protein factors. These enhancers may also occur inside genes, but how many may be active in human cells and their effects on the regulation of the host gene remains unclear./nResults. We describe a novel semi-supervised method based on the relative enrichment of chromatin signals between 2 conditions to predict active enhancers. We applied this method to the tumoral K562 and the normal GM12878 cell lines to predict enhancers that are differentially active in one cell type. These predictions show enhancer-like properties according to positional distribution, correlation with gene expression and production of enhancer RNAs. Using this model, we predict 10,365 and 9777 intragenic active enhancers in K562 and GM12878, respectively, and relate the differential activation of these enhancers to expression and splicing differences of the host genes./nConclusions. We propose that the activation or silencing of intragenic transcriptional enhancers modulate the regulation of the host gene by means of a local change of the chromatin and the recruitment of enhancer-related factors that may interact with the RNA directly or through the interaction with RNA binding proteins. Predicted enhancers are available at http://regulatorygenomics.upf.edu/Projects/enhancers.html.The authors would like to thank E. Furlong, Y. Barash, B. Blencowe and U. Braunschweig for useful discussions. This work was supported by grants from Plan Nacional I + D (BIO2011-23920) and Consolider (CSD2009-00080) from MINECO (Spanish Government), and by the Sandra Ibarra Foundation for Cancer (FSI 2013). JGV and BS were supported FPI grants from the MINECO (Spanish Government) BES-2009-018064 and BES-2012-052683, respectively

    Leveraging transcript quantification for fast computation of alternative splicing profiles

    Get PDF
    Alternative splicing plays an essential role in many cellular processes and bears major relevance in the understanding of multiple diseases, including cancer. High-throughput RNA sequencing allows genome-wide analyses of splicing across multiple conditions. However, the increasing number of available data sets represents a major challenge in terms of computation time and storage requirements. We describe SUPPA, a computational tool to calculate relative inclusion values of alternative splicing events, exploiting fast transcript quantification. SUPPA accuracy is comparable and sometimes superior to standard methods using simulated as well as real RNA-sequencing data compared with experimentally validated events. We assess the variability in terms of the choice of annotation and provide evidence that using complete transcripts rather than more transcripts per gene provides better estimates. Moreover, SUPPA coupled with de novo transcript reconstruction methods does not achieve accuracies as high as using quantification of known transcripts, but remains comparable to existing methods. Finally, we show that SUPPA is more than 1000 times faster than standard methods. Coupled with fast transcript quantification, SUPPA provides inclusion values at a much higher speed than existing methods without compromising accuracy, thereby facilitating the systematic splicing analysis of large data sets with limited computational resources. The software is implemented in Python 2.7 and is available under the MIT license at https://bitbucket.org/regulatorygenomicsupf/suppa.This work was supported by the Spanish Government (BIO2011-23920, CSD2009-00080), the Sandra Ibarra Foundation for Cancer (FSI-2013), and partially by the Spanish National Institute of Bioinformatics (INB

    The discovery potential of RNA processing profiles

    No full text
    Small non-coding RNAs (sncRNAs) are highly abundant molecules that regulate essential cellular processes and are classified according to sequence and structure. Here we argue that read profiles from size-selected RNA sequencing capture the post-transcriptional processing specific to each RNA family, thereby providing functional information independently of sequence and structure. We developed SeRPeNT, a new computational method that exploits reproducibility across replicates and uses dynamic time-warping and density-based clustering algorithms to identify, characterize and compare sncRNAs by harnessing the power of read profiles. We applied SeRPeNT to: (i) generate an extended human annotation with 671 new sncRNAs from known classes and 131 from new potential classes, (ii) show pervasive differential processing of sncRNAs between cell compartments and (iii) predict new molecules with miRNA-like behaviour from snoRNA, tRNA and long non-coding RNA precursors, potentially dependent on the miRNA biogenesis pathway. Furthermore, we validated experimentally four predicted novel non-coding RNAs: a miRNA, a snoRNA-derived miRNA, a processed tRNA and a new uncharacterized sncRNA. SeRPeNT facilitates fast and accurate discovery and characterization of sncRNAs at an unprecedented scale. SeRPeNT code is available under the MIT license at https://github.com/comprna/SeRPeNT.MINECO (to E.E., A.P.); FEDER [BIO2014-52566-R to E.E., A.P.]; Consolider RNAREG [CSD2009-00080 to E.E., A.P.]; AGAUR [SGR2014-1121 to E.E., A.P.]; European ITN Network RNP-Net [ID:289007 to E.E., A.P.]; Sandra Ibarra Foundation for Cancer [FSI2013 to E.E., A.P.]; MINECO [BIO2011-26205 to R.G., A.P., SAF2014-60551-R to E.M., J.P.A.]; National Human Genome Research Institute of the National Institutes of Health [U54HG007004 to R.G., A.P.]; MINECO Centro de Excelencia Severo Ochoa 2013-2017 [SEV-2012-0208 to R.G., A.P.]; Research Programme on Biomedical Informatics (GRIB), which is member of ELIXIR-Excelerate of the European Union Horizon 2020 Programme 2014-2020 [676559 to E.E., A.P., I.D.]; ELIXIR-Excelerate, European Union Horizon 2020 Spanish National Bioinformatics Institute (INB) [PT13/0001/0023 to E.E., A.P., I.D.]. Funding for open access charge: European ITN Network RNP-Net [ID:289007] and MINECO-FEDER [BIO2014-52566-R]

    The 5' untranslated region of the serotonin receptor 2C pre-mRNA generates miRNAs and is expressed in non-neuronal cells

    No full text
    The serotonin receptor 2C (HTR2C) gene encodes a G protein-coupled receptor that is exclusively expressed in neurons. Here, we report that the 5' untranslated region of the receptor pre-mRNA as well as its hosted miRNAs is widely expressed in non-neuronal cell lines. Alternative splicing of HTR2C is regulated by MBII-52. MBII-52 and the neighboring MBII-85 cluster are absent in people with Prader-Willi syndrome, which likely causes the disease. We show that MBII-52 and MBII-85 increase expression of the HTR2C 5' UTR and influence expression of the hosted miRNAs. The data indicate that the transcriptional unit expressing HTR2C is more complex than previously recognized and likely deregulated in Prader-Willi syndrome.This work was supported by NIH RO1 GM083187, P20RR020171 to SS; GM079549 to RS and JS; Binational Science Foundation (BSF), USA-Israel, transformative Grant, #2010508, to SS and RS. EE and AP were supported by the Spanish Ministry of Science with grant BIO2011-23920 and by Sandra Ibarra Foundation for Cancer with grant FSI 2011-03

    Large-scale analysis of genome and transcriptome alterations in multiple tumors unveils novel cancer-relevant splicing networks

    No full text
    Alternative splicing is regulated by multiple RNA-binding proteins and influences the expression of most eukaryotic genes. However, the role of this process in human disease, and particularly in cancer, is only starting to be unveiled. We systematically analyzed mutation, copy number, and gene expression patterns of 1348 RNA-binding protein (RBP) genes in 11 solid tumor types, together with alternative splicing changes in these tumors and the enrichment of binding motifs in the alternatively spliced sequences. Our comprehensive study reveals widespread alterations in the expression of RBP genes, as well as novel mutations and copy number variations in association with multiple alternative splicing changes in cancer drivers and oncogenic pathways. Remarkably, the altered splicing patterns in several tumor types recapitulate those of undifferentiated cells. These patterns are predicted to be mainly controlled by MBNL1 and involve multiple cancer drivers, including the mitotic gene NUMA1. We show that NUMA1 alternative splicing induces enhanced cell proliferation and centrosome amplification in nontumorigenic mammary epithelial cells. Our study uncovers novel splicing networks that potentially contribute to cancer development and progression.We thank P. Papasaikas, B. Blencowe, M. Irimia, and Q. Morris for comments and discussions. E.S., B.S., A.P., and E.E. were supported by the Ministerio de Economía y Competitividad (MINECO) and European Commission (FEDER) (BIO2014-52566-R), Consolider RNAREG (CSD2009-00080), by Agència de Gestió d’Ajuts Universitaris i de Recerca (AGAUR) (SGR2014-1121), and by the Sandra Ibarra Foundation for Cancer (FSI2013). J.V. and B.M. were supported by Fundación Botín, by Banco de Santander through its Santander Universities Global Division, and by Consolider RNAREG (CSD2009-00080), MINECO, and AGAUR. F.M. and M.A.P. were supported by AECC (Hereditary Cancer), AGAUR (SGR2014-364), the Instituto de Salud Carlos III (ISCIII), the MINECO, and FEDER (PIE13/00022-ONCOPROFILE, PI15/00854, and RTICC RD12/0036/0008)
    corecore