20 research outputs found

    Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change

    Get PDF
    BACKGROUND: Non-coding RNAs (ncRNAs) have a multitude of roles in the cell, many of which remain to be discovered. However, it is difficult to detect novel ncRNAs in biochemical screens. To advance biological knowledge, computational methods that can accurately detect ncRNAs in sequenced genomes are therefore desirable. The increasing number of genomic sequences provides a rich dataset for computational comparative sequence analysis and detection of novel ncRNAs. RESULTS: Here, Dynalign, a program for predicting secondary structures common to two RNA sequences on the basis of minimizing folding free energy change, is utilized as a computational ncRNA detection tool. The Dynalign-computed optimal total free energy change, which scores the structural alignment and the free energy change of folding into a common structure for two RNA sequences, is shown to be an effective measure for distinguishing ncRNA from randomized sequences. To make the classification as a ncRNA, the total free energy change of an input sequence pair can either be compared with the total free energy changes of a set of control sequence pairs, or be used in combination with sequence length and nucleotide frequencies as input to a classification support vector machine. The latter method is much faster, but slightly less sensitive at a given specificity. Additionally, the classification support vector machine method is shown to be sensitive and specific on genomic ncRNA screens of two different Escherichia coli and Salmonella typhi genome alignments, in which many ncRNAs are known. The Dynalign computational experiments are also compared with two other ncRNA detection programs, RNAz and QRNA. CONCLUSION: The Dynalign-based support vector machine method is more sensitive for known ncRNAs in the test genomic screens than RNAz and QRNA. Additionally, both Dynalign-based methods are more sensitive than RNAz and QRNA at low sequence pair identities. Dynalign can be used as a comparable or more accurate tool than RNAz or QRNA in genomic screens, especially for low-identity regions. Dynalign provides a method for discovering ncRNAs in sequenced genomes that other methods may not identify. Significant improvements in Dynalign runtime have also been achieved

    XRate: a fast prototyping, training and annotation tool for phylo-grammars

    Get PDF
    BACKGROUND: Recent years have seen the emergence of genome annotation methods based on the phylo-grammar, a probabilistic model combining continuous-time Markov chains and stochastic grammars. Previously, phylo-grammars have required considerable effort to implement, limiting their adoption by computational biologists. RESULTS: We have developed an open source software tool, xrate, for working with reversible, irreversible or parametric substitution models combined with stochastic context-free grammars. xrate efficiently estimates maximum-likelihood parameters and phylogenetic trees using a novel "phylo-EM" algorithm that we describe. The grammar is specified in an external configuration file, allowing users to design new grammars, estimate rate parameters from training data and annotate multiple sequence alignments without the need to recompile code from source. We have used xrate to measure codon substitution rates and predict protein and RNA secondary structures. CONCLUSION: Our results demonstrate that xrate estimates biologically meaningful rates and makes predictions whose accuracy is comparable to that of more specialized tools

    A personalized platform identifies trametinib plus zoledronate for a patient with KRAS-mutant metastatic colorectal cancer

    Get PDF
    Colorectal cancer remains a leading source of cancer mortality worldwide. Initial response is often followed by emergent resistance that is poorly responsive to targeted therapies, reflecting currently undruggable cancer drivers such as KRAS and overall genomic complexity. Here, we report a novel approach to developing a personalized therapy for a patient with treatment-resistant metastatic KRAS-mutant colorectal cancer. An extensive genomic analysis of the tumor's genomic landscape identified nine key drivers. A transgenic model that altered orthologs of these nine genes in the Drosophila hindgut was developed; a robotics-based screen using this platform identified trametinib plus zoledronate as a candidate treatment combination. Treating the patient led to a significant response: Target and nontarget lesions displayed a strong partial response and remained stable for 11 months. By addressing a disease's genomic complexity, this personalized approach may provide an alternative treatment option for recalcitrant disease such as KRAS-mutant colorectal cancer

    Molecular characterization of hepatocellular carcinoma in patients with nonalcoholic steatohepatitis

    Full text link
    Background and aims: Non-alcoholic steatohepatitis (NASH)-related hepatocellular carcinoma (HCC) is increasing globally, but its molecular features are not well defined. We aimed to identify unique molecular traits characterising NASH-HCC compared to other HCC aetiologies. Methods: We collected 80 NASH-HCC and 125 NASH samples from 5 institutions. Expression array (n = 53 NASH-HCC; n = 74 NASH) and whole exome sequencing (n = 52 NASH-HCC) data were compared to HCCs of other aetiologies (n = 184). Three NASH-HCC mouse models were analysed by RNA-seq/expression-array (n = 20). Activin A receptor type 2A (ACVR2A) was silenced in HCC cells and proliferation assessed by colorimetric and colony formation assays. Results: Mutational profiling of NASH-HCC tumours revealed TERT promoter (56%), CTNNB1 (28%), TP53 (18%) and ACVR2A (10%) as the most frequently mutated genes. ACVR2A mutation rates were higher in NASH-HCC than in other HCC aetiologies (10% vs. 3%, p <0.05). In vitro, ACVR2A silencing prompted a significant increase in cell proliferation in HCC cells. We identified a novel mutational signature (MutSig-NASH-HCC) significantly associated with NASH-HCC (16% vs. 2% in viral/alcohol-HCC, p = 0.03). Tumour mutational burden was higher in non-cirrhotic than in cirrhotic NASH-HCCs (1.45 vs. 0.94 mutations/megabase; p <0.0017). Compared to other aetiologies of HCC, NASH-HCCs were enriched in bile and fatty acid signalling, oxidative stress and inflammation, and presented a higher fraction of Wnt/TGF-β proliferation subclass tumours (42% vs. 26%, p = 0.01) and a lower prevalence of the CTNNB1 subclass. Compared to other aetiologies, NASH-HCC showed a significantly higher prevalence of an immunosuppressive cancer field. In 3 murine models of NASH-HCC, key features of human NASH-HCC were preserved. Conclusions: NASH-HCCs display unique molecular features including higher rates of ACVR2A mutations and the presence of a newly identified mutational signature. Lay summary: The prevalence of hepatocellular carcinoma (HCC) associated with non-alcoholic steatohepatitis (NASH) is increasing globally, but its molecular traits are not well characterised. In this study, we uncovered higher rates of ACVR2A mutations (10%) - a potential tumour suppressor - and the presence of a novel mutational signature that characterises NASH-related HCC

    Evolutionary Modeling and Prediction of Non-Coding RNAs in Drosophila

    Get PDF
    We performed benchmarks of phylogenetic grammar-based ncRNA gene prediction, experimenting with eight different models of structural evolution and two different programs for genome alignment. We evaluated our models using alignments of twelve Drosophila genomes. We find that ncRNA prediction performance can vary greatly between different gene predictors and subfamilies of ncRNA gene. Our estimates for false positive rates are based on simulations which preserve local islands of conservation; using these simulations, we predict a higher rate of false positives than previous computational ncRNA screens have reported. Using one of the tested prediction grammars, we provide an updated set of ncRNA predictions for D. melanogaster and compare them to previously-published predictions and experimental data. Many of our predictions show correlations with protein-coding genes. We found significant depletion of intergenic predictions near the 3′ end of coding regions and furthermore depletion of predictions in the first intron of protein-coding genes. Some of our predictions are colocated with larger putative unannotated genes: for example, 17 of our predictions showing homology to the RFAM family snoR28 appear in a tandem array on the X chromosome; the 4.5 Kbp spanned by the predicted tandem array is contained within a FlyBase-annotated cDNA

    Novel applications of high-throughput RNA sequencing: mapping RNA structure and discovering circular RNAs

    No full text
    High-throughput RNA sequencing (RNA-Seq), although still novel, has primarily been applied as a method for assessing differential RNA abundance or mapping of primary structure of linear transcripts, e.g. inference of splice junctions. I report on two novel applications of RNA-Seq for which I developed computational pipelines. The first (FragSeq) is a coupling of classic enzymatic RNA structure probing with RNA-Seq in order to obtain high-throughput, single-base-resolution endonuclease accessibility maps of entire transcriptomes, thus yielding RNA structure information. A proof-of-principle application of this method on two mouse nuclear RNA samples showed that ssRNA regions of known nuclear ncRNA structures are accurately mapped. Also, mapping of novel structures was validated by follow-up probing. The second application is my pipeline for discovery of RNA circularization from RNA-Seq reads that I applied to a broad unpublished dataset spanning 21 archaeal species and a bacterium, uncovering evidence that C/D RNA guide transcripts are circularized in hyperthermophiles. My findings agree with published findings of circular C/D RNA in three species (P. furiosus, S. acidocaldarius, and S. solfataricus) and provide high-confidence evidence for broad C/D RNA circularization in at least two new species (I. hospitalis and T. kodakaraensis), arguing that this circularization is phylogenetically widespread. Interestingly, the crenarchaeal hyperthermophile P. aerophilum has circularization of transcripts anti-sense to C/D RNAs. This is currently the broadest study of circularization in any domain of life
    corecore