49 research outputs found

    Identify Alternative Splicing Events Based on Position-Specific Evolutionary Conservation

    Get PDF
    The evolution of eukaryotes is accompanied by the increased complexity of alternative splicing which greatly expands genome information. One of the greatest challenges in the post-genome era is a complete revelation of human transcriptome with consideration of alternative splicing. Here, we introduce a comparative genomics approach to systemically identify alternative splicing events based on the differential evolutionary conservation between exons and introns and the high-quality annotation of the ENCODE regions. Specifically, we focus on exons that are included in some transcripts but are completely spliced out for others and we call them conditional exons. First, we characterize distinguishing features among conditional exons, constitutive exons and introns. One of the most important features is the position-specific conservation score. There are dramatic differences in conservation scores between conditional exons and constitutive exons. More importantly, the differences are position-specific. For flanking intronic regions, the differences between conditional exons and constitutive exons are also position-specific. Using the Random Forests algorithm, we can classify conditional exons with high specificities (97% for the identification of conditional exons from intron regions and 95% for the classification of known exons) and fair sensitivities (64% and 32% respectively). We applied the method to the human genome and identified 39,640 introns that actually contain conditional exons and classified 8,813 conditional exons from the current RefSeq exon list. Among those, 31,673 introns containing conditional exons and 5,294 conditional exons classified from known exons cannot be inferred from RefSeq, UCSC or Ensembl annotations. Some of these de novo predictions were experimentally verified

    Studying alternative splicing regulatory networks through partial correlation analysis

    Get PDF
    The identification of links between exons and their regulators or targets and between co-spliced exons in human, mouse and rat provides novel insights into the alternative splicing regulatory network

    A protein assembly mediates Xist localization and gene silencing

    Get PDF
    Nuclear compartments have diverse roles in regulating gene expression, yet the molecular forces and components that drive compartment formation remain largely unclear. The long non-coding RNA Xist establishes an intra-chromosomal compartment by localizing at a high concentration in a territory spatially close to its transcription locus and binding diverse proteins to achieve X-chromosome inactivation (XCI). The XCI process therefore serves as a paradigm for understanding how RNA-mediated recruitment of various proteins induces a functional compartment. The properties of the inactive X (Xi)-compartment are known to change over time, because after initial Xist spreading and transcriptional shutoff a state is reached in which gene silencing remains stable even if Xist is turned off. Here we show that the Xist RNA-binding proteins PTBP1, MATR3, TDP-43 and CELF1 assemble on the multivalent E-repeat element of Xist and, via self-aggregation and heterotypic protein–protein interactions, form a condensate in the Xi. This condensate is required for gene silencing and for the anchoring of Xist to the Xi territory, and can be sustained in the absence of Xist. Notably, these E-repeat-binding proteins become essential coincident with transition to the Xist-independent XCI phase, indicating that the condensate seeded by the E-repeat underlies the developmental switch from Xist-dependence to Xist-independence. Taken together, our data show that Xist forms the Xi compartment by seeding a heteromeric condensate that consists of ubiquitous RNA-binding proteins, revealing an unanticipated mechanism for heritable gene silencing

    A hierarchical Bayesian model for comparing transcriptomes at the individual transcript isoform level

    Get PDF
    The complexity of mammalian transcriptomes is compounded by alternative splicing which allows one gene to produce multiple transcript isoforms. However, transcriptome comparison has been limited to differential analysis at the gene level instead of the individual transcript isoform level. High-throughput sequencing technologies and high-resolution tiling arrays provide an unprecedented opportunity to compare transcriptomes at the level of individual splice variants. However, sequence read coverage or probe intensity at each position may represent a family of splice variants instead of one single isoform. Here we propose a hierarchical Bayesian model, BASIS (Bayesian Analysis of Splicing IsoformS), to infer the differential expression level of each transcript isoform in response to two conditions. A latent variable was introduced to perform direct statistical selection of differentially expressed isoforms. Model parameters were inferred based on an ergodic Markov chain generated by our Gibbs sampler. BASIS has the ability to borrow information across different probes (or positions) from the same genes and different genes. BASIS can handle the heteroskedasticity of probe intensity or sequence read coverage. We applied BASIS to a human tiling-array data set and a mouse RNA-seq data set. Some of the predictions were validated by quantitative real-time RT–PCR experiments
    corecore