25 research outputs found

    RScan: fast searching structural similarities for structured RNAs in large databases

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many RNAs have evolutionarily conserved secondary structures instead of primary sequences. Recently, there are an increasing number of methods being developed with focus on the structural alignments for finding conserved secondary structures as well as common structural motifs in pair-wise or multiple sequences. A challenging task is to search similar structures quickly for structured RNA sequences in large genomic databases since existing methods are too slow to be used in large databases.</p> <p>Results</p> <p>An implementation of a fast structural alignment algorithm, RScan, is proposed to fulfill the task. RScan is developed by levering the advantages of both hashing algorithms and local alignment algorithms. In our experiment, on the average, the times for searching a tRNA and an rRNA in the randomized <it>A. pernix </it>genome are only 256 seconds and 832 seconds respectively by using RScan, but need 3,178 seconds and 8,951 seconds respectively by using an existing method RSEARCH. Remarkably, RScan can handle large database queries, taking less than 4 minutes for searching similar structures for a microRNA precursor in human chromosome 21.</p> <p>Conclusion</p> <p>These results indicate that RScan is a preferable choice for real-life application of searching structural similarities for structured RNAs in large databases. RScan software is freely available at <url>http://bioinfo.au.tsinghua.edu.cn/member/cxue/rscan/RScan.htm</url>.</p

    Functional importance of different patterns of correlation between adjacent cassette exons in human and mouse

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Alternative splicing expands transcriptome diversity and plays an important role in regulation of gene expression. Previous studies focus on the regulation of a single cassette exon, but recent experiments indicate that multiple cassette exons within a gene may interact with each other. This interaction can increase the potential to generate various transcripts and adds an extra layer of complexity to gene regulation. Several cases of exon interaction have been discovered. However, the extent to which the cassette exons coordinate with each other remains unknown.</p> <p>Results</p> <p>Based on EST data, we employed a metric of correlation coefficients to describe the interaction between two adjacent cassette exons and then categorized these exon pairs into three different groups by their interaction (correlation) patterns. Sequence analysis demonstrates that strongly-correlated groups are more conserved and contain a higher proportion of pairs with reading frame preservation in a combinatorial manner. Multiple genome comparison further indicates that different groups of correlated pairs have different evolutionary courses: (1) The vast majority of positively-correlated pairs are old, (2) most of the weakly-correlated pairs are relatively young, and (3) negatively-correlated pairs are a mixture of old and young events.</p> <p>Conclusion</p> <p>We performed a large-scale analysis of interactions between adjacent cassette exons. Compared with weakly-correlated pairs, the strongly-correlated pairs, including both the positively and negatively correlated ones, show more evidence that they are under delicate splicing control and tend to be functionally important. Additionally, the positively-correlated pairs bear strong resemblance to constitutive exons, which suggests that they may evolve from ancient constitutive exons, while negatively and weakly correlated pairs are more likely to contain newly emerging exons.</p

    Landscape of transcription in human cells

    Get PDF
    Eukaryotic cells make many types of primary and processed RNAs that are found either in specific sub-cellular compartments or throughout the cells. A complete catalogue of these RNAs is not yet available and their characteristic sub-cellular localizations are also poorly understood. Since RNA represents the direct output of the genetic information encoded by genomes and a significant proportion of a cell’s regulatory capabilities are focused on its synthesis, processing, transport, modifications and translation, the generation of such a catalogue is crucial for understanding genome function. Here we report evidence that three quarters of the human genome is capable of being transcribed, as well as observations about the range and levels of expression, localization, processing fates, regulatory regions and modifications of almost all currently annotated and thousands of previously unannotated RNAs. These observations taken together prompt to a redefinition of the concept of a gene

    Comparative analysis of the transcriptome across distant species

    Get PDF
    The transcriptome is the readout of the genome. Identifying common features in it across distant species can reveal fundamental principles. To this end, the ENCODE and modENCODE consortia have generated large amounts of matched RNA-sequencing data for human, worm and fly. Uniform processing and comprehensive annotation of these data allow comparison across metazoan phyla, extending beyond earlier within-phylum transcriptome comparisons and revealing ancient, conserved features. Specifically, we discover co-expression modules shared across animals, many of which are enriched in developmental genes. Moreover, we use expression patterns to align the stages in worm and fly development and find a novel pairing between worm embryo and fly pupae, in addition to the embryo-to-embryo and larvae-to-larvae pairings. Furthermore, we find that the extent of non-canonical, non-coding transcription is similar in each organism, per base pair. Finally, we find in all three organisms that the gene-expression levels, both coding and non-coding, can be quantitatively predicted from chromatin features at the promoter using a 'universal model' based on a single set of organism-independent parameters

    RScan: fast searching structural similarities for structured RNAs in large databases-0

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "RScan: fast searching structural similarities for structured RNAs in large databases"</p><p>http://www.biomedcentral.com/1471-2164/8/257</p><p>BMC Genomics 2007;8():257-257.</p><p>Published online 31 Jul 2007</p><p>PMCID:PMC1949409.</p><p></p>hadowed symbol sequences is defined as the "structural database"

    Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine

    No full text
    Abstract Background MicroRNAs (miRNAs) are a group of short (~22 nt) non-coding RNAs that play important regulatory roles. MiRNA precursors (pre-miRNAs) are characterized by their hairpin structures. However, a large amount of similar hairpins can be folded in many genomes. Almost all current methods for computational prediction of miRNAs use comparative genomic approaches to identify putative pre-miRNAs from candidate hairpins. Ab initio method for distinguishing pre-miRNAs from sequence segments with pre-miRNA-like hairpin structures is lacking. Being able to classify real vs. pseudo pre-miRNAs is important both for understanding of the nature of miRNAs and for developing ab initio prediction methods that can discovery new miRNAs without known homology. Results A set of novel features of local contiguous structure-sequence information is proposed for distinguishing the hairpins of real pre-miRNAs and pseudo pre-miRNAs. Support vector machine (SVM) is applied on these features to classify real vs. pseudo pre-miRNAs, achieving about 90% accuracy on human data. Remarkably, the SVM classifier built on human data can correctly identify up to 90% of the pre-miRNAs from other species, including plants and virus, without utilizing any comparative genomics information. Conclusion The local structure-sequence features reflect discriminative and conserved characteristics of miRNAs, and the successful ab initio classification of real and pseudo pre-miRNAs opens a new approach for discovering new miRNAs.</p

    uc.454 Inhibited Growth by Targeting Heat Shock Protein Family A Member 12B in Non-Small-Cell Lung Cancer

    No full text
    Transcribed ultraconserved regions (T-UCRs) classified as long non-coding RNAs (Lnc-RNAs) are transcripts longer than 200-nt RNA with no protein-coding capacity. Previous studies showed that T-UCRs serve as novel oncogenes, or tumor suppressors are involved in tumorigenesis and cancer progressive. Nevertheless, the clinicopathologic significance and regulatory mechanism of T-UCRs in lung cancer (LC) remain largely unknown. We found that uc.454 was downregulated in both non-small-cell LC (NSCLC) tissues and LC cell lines, and the downregulated uc.454 is associated with tumor size and tumors with more advanced stages. Transfection with uc.454 markedly induced apoptosis and inhibited cell proliferation in SPC-A-1 and NCI-H2170 LC cell lines. Above results suggested that uc.454 played a suppressive role in LC. Heat shock protein family A member 12B (HSPA12B) protein was negatively regulated by uc.454 at the posttranscriptional level by dual-luciferase reporter assay and affected the expressions of Bcl-2 family members, which finally induced LC apoptosis. The uc.454/HSPA12B axis furthers our understanding of the molecular mechanisms involved in tumor apoptosis, which may potentially serve as a therapeutic target for lung carcinoma. Keywords: uc.454, HSPA12B, NSCLC, lung cancer, apoptosis, proliferatio

    The fusion landscape of hepatocellular carcinoma

    No full text
    Most cases of hepatocellular carcinoma (HCC) are already advanced at the time of diagnosis, which limits treatment options. Challenges in early‐stage diagnosis may be due to the genetic complexity of HCC. Gene fusion plays a critical function in tumorigenesis and cancer progression in multiple cancers, yet the identities of fusion genes as potential diagnostic markers in HCC have not been investigated. Here, we employed STAR‐Fusion and identified 43 recurrent fusion events in our own and four public RNA‐seq datasets. We identified 2354 different gene fusions in two hepatitis B virus (HBV)‐HCC patients. Validation analysis against the four RNA‐seq datasets revealed that only 1.8% (43/2354) were recurrent fusions. Comparison with the four fusion databases demonstrated that 19 recurrent fusions were not previously annotated to diseases and three were annotated as disease‐related fusion events. Finally, we validated six of the novel fusion events, including RP11‐476K15.1‐CTD‐2015H3.2, by RT‐PCR and Sanger sequencing of 14 pairs of HBV‐related HCC samples. In summary, our study provides new insights into gene fusions in HCC and may contribute to the development of anti‐HCC therapy
    corecore