5,275 research outputs found

    Role of Splicing Regulatory Elements and In Silico Tools Usage in the Identification of Deep Intronic Splicing Variants in Hereditary Breast/Ovarian Cancer Genes

    Get PDF
    Cancer hereditario de mama y ovario; Pseudoexones; Variantes intrónicas profundas spliceogénicasCàncer hereditari de mama i d'ovari; Pseudoexons; Variants intròniques profundes spliceogèniquesHereditary breast ovarian cancer; Pseudoexons; Spliceogenic deep intronic variantsThe contribution of deep intronic splice-altering variants to hereditary breast and ovarian cancer (HBOC) is unknown. Current computational in silico tools to predict spliceogenic variants leading to pseudoexons have limited efficiency. We assessed the performance of the SpliceAI tool combined with ESRseq scores to identify spliceogenic deep intronic variants by affecting cryptic sites or splicing regulatory elements (SREs) using literature and experimental datasets. Our results with 233 published deep intronic variants showed that SpliceAI, with a 0.05 threshold, predicts spliceogenic deep intronic variants affecting cryptic splice sites, but is less effective in detecting those affecting SREs. Next, we characterized the SRE profiles using ESRseq, showing that pseudoexons are significantly enriched in SRE-enhancers compared to adjacent intronic regions. Although the combination of SpliceAI with ESRseq scores (considering ∆ESRseq and SRE landscape) showed higher sensitivity, the global performance did not improve because of the higher number of false positives. The combination of both tools was tested in a tumor RNA dataset with 207 intronic variants disrupting splicing, showing a sensitivity of 86%. Following the pipeline, five spliceogenic deep intronic variants were experimentally identified from 33 variants in HBOC genes. Overall, our results provide a framework to detect deep intronic variants disrupting splicing.This research was funded by the Spanish Instituto de Salud Carlos III (ISCIII) funding an initiative of the Spanish Ministry of Economy and Innovation, partially supported by European Regional Development FEDER Funds, grant numbers PI16/01218 and PI19/01303. AM-F contract is supported by the award ERAPERMED2019-215 granted by AECC FC and by ISCIII thorough AES 2019, both within the ERAPerMed framework”. J.D.-V. contract is supported by the Secretariat for Universities and Research of the Ministry of Business and Knowledge of the Government of Catalonia and the European Social Fund

    SpliceDisease database: linking RNA splicing and disease

    Get PDF
    RNA splicing is an important aspect of gene regulation in many organisms. Splicing of RNA is regulated by complicated mechanisms involving numerous RNA-binding proteins and the intricate network of interactions among them. Mutations in cis-acting splicing elements or its regulatory proteins have been shown to be involved in human diseases. Defects in pre-mRNA splicing process have emerged as a common disease-causing mechanism. Therefore, a database integrating RNA splicing and disease associations would be helpful for understanding not only the RNA splicing but also its contribution to disease. In SpliceDisease database, we manually curated 2337 splicing mutation disease entries involving 303 genes and 370 diseases, which have been supported experimentally in 898 publications. The SpliceDisease database provides information including the change of the nucleotide in the sequence, the location of the mutation on the gene, the reference Pubmed ID and detailed description for the relationship among gene mutations, splicing defects and diseases. We standardized the names of the diseases and genes and provided links for these genes to NCBI and UCSC genome browser for further annotation and genomic sequences. For the location of the mutation, we give direct links of the entry to the respective position/region in the genome browser. The users can freely browse, search and download the data in SpliceDisease at http://cmbi.bjmu.edu.cn/sdisease

    A computational approach for genome-wide mapping of splicing factor binding sites

    Get PDF
    A computational method is presented for genome-wide mapping of splicing factor binding sites that considers both the genomic environment and evolutionary conservation

    Defective splicing, disease and therapy: searching for master checkpoints in exon definition

    Get PDF
    The number of aberrant splicing processes causing human disease is growing exponentially and many recent studies have uncovered some aspects of the unexpectedly complex network of interactions involved in these dysfunctions. As a consequence, our knowledge of the various cis- and trans-acting factors playing a role on both normal and aberrant splicing pathways has been enhanced greatly. However, the resulting information explosion has also uncovered the fact that many splicing systems are not easy to model. In fact we are still unable, with certainty, to predict the outcome of a given genomic variation. Nonetheless, in the midst of all this complexity some hard won lessons have been learned and in this survey we will focus on the importance of the wide sequence context when trying to understand why apparently similar mutations can give rise to different effects. The examples discussed in this summary will highlight the fine ‘balance of power’ that is often present between all the various regulatory elements that define exon boundaries. In the final part, we shall then discuss possible therapeutic targets and strategies to rescue genetic defects of complex splicing systems

    Computational identification of tissue-specific alternative splicing elements in mouse genes from RNA-Seq

    Get PDF
    Tissue-specific alternative splicing is a key mechanism for generating tissue-specific proteomic diversity in eukaryotes. Splicing regulatory elements (SREs) in pre-mature messenger RNA play a very important role in regulating alternative splicing. In this article, we use mouse RNA-Seq data to determine a positive data set where SREs are over-represented and a reliable negative data set where the same SREs are most likely under-represented for a specific tissue and then employ a powerful discriminative approach to identify SREs. We identified 456 putative splicing enhancers or silencers, of which 221 were predicted to be tissue-specific. Most of our tissue-specific SREs are likely different from constitutive SREs, since only 18% of our exonic splicing enhancers (ESEs) are contained in constitutive RESCUE-ESEs. A relatively small portion (20%) of our SREs is included in tissue-specific SREs in human identified in two recent studies. In the analysis of position distribution of SREs, we found that a dozen of SREs were biased to a specific region. We also identified two very interesting SREs that can function as an enhancer in one tissue but a silencer in another tissue from the same intronic region. These findings provide insight into the mechanism of tissue-specific alternative splicing and give a set of valuable putative SREs for further experimental investigations

    Learning the Regulatory Code of Gene Expression

    Get PDF
    Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology

    Machine learning models towards elucidating the plant intron retention code

    Get PDF
    2017 Fall.Includes bibliographical references.Alternative Splicing is a process that allows a single gene to encode multiple proteins. Intron Retention (IR) is a type of alternative splicing which is mainly prevalent in plants, but has been shown to regulate gene expression in various organisms and is often involved in rare human diseases. Despite its important role, not much research has been done to understand IR. The motivation behind this research work is to better understand IR and how it is regulated by various biological factors. We designed a combination of 137 features, forming an "intron retention code", to reveal the factors that contribute to IR. Using random forest and support vector machine classifiers, we show the usefulness of these features for the task of predicting whether an intron is subject to IR or not. An analysis of the top-ranking features for this task reveals a high level of similarity of the most predictive features across the three plant species, demonstrating the conservation of the factors that determine IR. We also found a high level of similarity to the top features contributing to IR in mammals. The task of predicting the response to drought stress proved more difficult, with lower levels of accuracy and lower levels of similarity across species, suggesting that additional features need to be considered for predicting condition-specific IR