104 research outputs found

    Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change

    Get PDF
    BACKGROUND: Non-coding RNAs (ncRNAs) have a multitude of roles in the cell, many of which remain to be discovered. However, it is difficult to detect novel ncRNAs in biochemical screens. To advance biological knowledge, computational methods that can accurately detect ncRNAs in sequenced genomes are therefore desirable. The increasing number of genomic sequences provides a rich dataset for computational comparative sequence analysis and detection of novel ncRNAs. RESULTS: Here, Dynalign, a program for predicting secondary structures common to two RNA sequences on the basis of minimizing folding free energy change, is utilized as a computational ncRNA detection tool. The Dynalign-computed optimal total free energy change, which scores the structural alignment and the free energy change of folding into a common structure for two RNA sequences, is shown to be an effective measure for distinguishing ncRNA from randomized sequences. To make the classification as a ncRNA, the total free energy change of an input sequence pair can either be compared with the total free energy changes of a set of control sequence pairs, or be used in combination with sequence length and nucleotide frequencies as input to a classification support vector machine. The latter method is much faster, but slightly less sensitive at a given specificity. Additionally, the classification support vector machine method is shown to be sensitive and specific on genomic ncRNA screens of two different Escherichia coli and Salmonella typhi genome alignments, in which many ncRNAs are known. The Dynalign computational experiments are also compared with two other ncRNA detection programs, RNAz and QRNA. CONCLUSION: The Dynalign-based support vector machine method is more sensitive for known ncRNAs in the test genomic screens than RNAz and QRNA. Additionally, both Dynalign-based methods are more sensitive than RNAz and QRNA at low sequence pair identities. Dynalign can be used as a comparable or more accurate tool than RNAz or QRNA in genomic screens, especially for low-identity regions. Dynalign provides a method for discovering ncRNAs in sequenced genomes that other methods may not identify. Significant improvements in Dynalign runtime have also been achieved

    XRate: a fast prototyping, training and annotation tool for phylo-grammars

    Get PDF
    BACKGROUND: Recent years have seen the emergence of genome annotation methods based on the phylo-grammar, a probabilistic model combining continuous-time Markov chains and stochastic grammars. Previously, phylo-grammars have required considerable effort to implement, limiting their adoption by computational biologists. RESULTS: We have developed an open source software tool, xrate, for working with reversible, irreversible or parametric substitution models combined with stochastic context-free grammars. xrate efficiently estimates maximum-likelihood parameters and phylogenetic trees using a novel "phylo-EM" algorithm that we describe. The grammar is specified in an external configuration file, allowing users to design new grammars, estimate rate parameters from training data and annotate multiple sequence alignments without the need to recompile code from source. We have used xrate to measure codon substitution rates and predict protein and RNA secondary structures. CONCLUSION: Our results demonstrate that xrate estimates biologically meaningful rates and makes predictions whose accuracy is comparable to that of more specialized tools

    PARTS: Probabilistic Alignment for RNA joinT Secondary structure prediction

    Get PDF
    A novel method is presented for joint prediction of alignment and common secondary structures of two RNA sequences. The joint consideration of common secondary structures and alignment is accomplished by structural alignment over a search space defined by the newly introduced motif called matched helical regions. The matched helical region formulation generalizes previously employed constraints for structural alignment and thereby better accommodates the structural variability within RNA families. A probabilistic model based on pseudo free energies obtained from precomputed base pairing and alignment probabilities is utilized for scoring structural alignments. Maximum a posteriori (MAP) common secondary structures, sequence alignment and joint posterior probabilities of base pairing are obtained from the model via a dynamic programming algorithm called PARTS. The advantage of the more general structural alignment of PARTS is seen in secondary structure predictions for the RNase P family. For this family, the PARTS MAP predictions of secondary structures and alignment perform significantly better than prior methods that utilize a more restrictive structural alignment model. For the tRNA and 5S rRNA families, the richer structural alignment model of PARTS does not offer a benefit and the method therefore performs comparably with existing alternatives. For all RNA families studied, the posterior probability estimates obtained from PARTS offer an improvement over posterior probability estimates from a single sequence prediction. When considering the base pairings predicted over a threshold value of confidence, the combination of sensitivity and positive predictive value is superior for PARTS than for the single sequence prediction. PARTS source code is available for download under the GNU public license at http://rna.urmc.rochester.edu

    An efficient genetic algorithm for structural RNA pairwise alignment and its application to non-coding RNA discovery in yeast

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Aligning RNA sequences with low sequence identity has been a challenging problem since such a computation essentially needs an algorithm with high complexities for taking structural conservation into account. Although many sophisticated algorithms for the purpose have been proposed to date, further improvement in efficiency is necessary to accelerate its large-scale applications including non-coding RNA (ncRNA) discovery.</p> <p>Results</p> <p>We developed a new genetic algorithm, Cofolga2, for simultaneously computing pairwise RNA sequence alignment and consensus folding, and benchmarked it using BRAliBase 2.1. The benchmark results showed that our new algorithm is accurate and efficient in both time and memory usage. Then, combining with the originally trained SVM, we applied the new algorithm to novel ncRNA discovery where we compared <it>S. cerevisiae </it>genome with six related genomes in a pairwise manner. By focusing our search to the relatively short regions (50 bp to 2,000 bp) sandwiched by conserved sequences, we successfully predict 714 intergenic and 1,311 sense or antisense ncRNA candidates, which were found in the pairwise alignments with stable consensus secondary structure and low sequence identity (≤ 50%). By comparing with the previous predictions, we found that > 92% of the candidates is novel candidates. The estimated rate of false positives in the predicted candidates is 51%. Twenty-five percent of the intergenic candidates has supports for expression in cell, i.e. their genomic positions overlap those of the experimentally determined transcripts in literature. By manual inspection of the results, moreover, we obtained four multiple alignments with low sequence identity which reveal consensus structures shared by three species/sequences.</p> <p>Conclusion</p> <p>The present method gives an efficient tool complementary to sequence-alignment-based ncRNA finders.</p

    Multiple small RNAs identified in Mycobacterium bovis BCG are also expressed in Mycobacterium tuberculosis and Mycobacterium smegmatis

    Get PDF
    Tuberculosis (TB) is a major global health problem, infecting millions of people each year. The causative agent of TB, Mycobacterium tuberculosis, is one of the world’s most ancient and successful pathogens. However, until recently, no work on small regulatory RNAs had been performed in this organism. Regulatory RNAs are found in all three domains of life, and have already been shown to regulate virulence in well-known pathogens, such as Staphylococcus aureus and Vibrio cholera. Here we report the discovery of 34 novel small RNAs (sRNAs) in the TB-complex M. bovis BCG, using a combination of experimental and computational approaches. Putative homologues of many of these sRNAs were also identified in M. tuberculosis and/or M. smegmatis. Those sRNAs that are also expressed in the non-pathogenic M. smegmatis could be functioning to regulate conserved cellular functions. In contrast, those sRNAs identified specifically in M. tuberculosis could be functioning in mediation of virulence, thus rendering them potential targets for novel antimycobacterials. Various features and regulatory aspects of some of these sRNAs are discussed

    A personalized platform identifies trametinib plus zoledronate for a patient with KRAS-mutant metastatic colorectal cancer

    Get PDF
    Colorectal cancer remains a leading source of cancer mortality worldwide. Initial response is often followed by emergent resistance that is poorly responsive to targeted therapies, reflecting currently undruggable cancer drivers such as KRAS and overall genomic complexity. Here, we report a novel approach to developing a personalized therapy for a patient with treatment-resistant metastatic KRAS-mutant colorectal cancer. An extensive genomic analysis of the tumor's genomic landscape identified nine key drivers. A transgenic model that altered orthologs of these nine genes in the Drosophila hindgut was developed; a robotics-based screen using this platform identified trametinib plus zoledronate as a candidate treatment combination. Treating the patient led to a significant response: Target and nontarget lesions displayed a strong partial response and remained stable for 11 months. By addressing a disease's genomic complexity, this personalized approach may provide an alternative treatment option for recalcitrant disease such as KRAS-mutant colorectal cancer

    Statistical evaluation of improvement in RNA secondary structure prediction

    Get PDF
    With discovery of diverse roles for RNA, its centrality in cellular functions has become increasingly apparent. A number of algorithms have been developed to predict RNA secondary structure. Their performance has been benchmarked by comparing structure predictions to reference secondary structures. Generally, algorithms are compared against each other and one is selected as best without statistical testing to determine whether the improvement is significant. In this work, it is demonstrated that the prediction accuracies of methods correlate with each other over sets of sequences. One possible reason for this correlation is that many algorithms use the same underlying principles. A set of benchmarks published previously for programs that predict a structure common to three or more sequences is statistically analyzed as an example to show that it can be rigorously evaluated using paired two-sample t-tests. Finally, a pipeline of statistical analyses is proposed to guide the choice of data set size and performance assessment for benchmarks of structure prediction. The pipeline is applied using 5S rRNA sequences as an example

    Drug sensitivity testing on patient-derived sarcoma cells predicts patient response to treatment and identifies c-Sarc inhibitors as active drugs for translocation sarcomas

    Get PDF
    BACKGROUND: Heterogeneity and low incidence comprise the biggest challenge in sarcoma diagnosis and treatment. Chemotherapy, although efficient for some sarcoma subtypes, generally results in poor clinical responses and is mostly recommended for advanced disease. Specific genomic aberrations have been identified in some sarcoma subtypes but few of them can be targeted with approved drugs. METHODS: We cultured and characterised patient-derived sarcoma cells and evaluated their sensitivity to 525 anti-cancer agents including both approved and non-approved drugs. In total, 14 sarcomas and 5 healthy mesenchymal primary cell cultures were studied. The sarcoma biopsies and derived cells were characterised by gene panel sequencing, cancer driver gene expression and by detecting specific fusion oncoproteins in situ in sarcomas with translocations. RESULTS: Soft tissue sarcoma cultures were established from patient biopsies with a success rate of 58%. The genomic profile and drug sensitivity testing on these samples helped to identify targeted inhibitors active on sarcomas. The cSrc inhibitor Dasatinib was identified as an active drug in sarcomas carrying chromosomal translocations. The drug sensitivity of the patient sarcoma cells ex vivo correlated with the response to the former treatment of the patient. CONCLUSIONS: Our results show that patient-derived sarcoma cells cultured in vitro are relevant and practical models for genotypic and phenotypic screens aiming to identify efficient drugs to treat sarcoma patients with poor treatment options.Peer reviewe

    Molecular characterization of hepatocellular carcinoma in patients with nonalcoholic steatohepatitis

    Full text link
    Background and aims: Non-alcoholic steatohepatitis (NASH)-related hepatocellular carcinoma (HCC) is increasing globally, but its molecular features are not well defined. We aimed to identify unique molecular traits characterising NASH-HCC compared to other HCC aetiologies. Methods: We collected 80 NASH-HCC and 125 NASH samples from 5 institutions. Expression array (n = 53 NASH-HCC; n = 74 NASH) and whole exome sequencing (n = 52 NASH-HCC) data were compared to HCCs of other aetiologies (n = 184). Three NASH-HCC mouse models were analysed by RNA-seq/expression-array (n = 20). Activin A receptor type 2A (ACVR2A) was silenced in HCC cells and proliferation assessed by colorimetric and colony formation assays. Results: Mutational profiling of NASH-HCC tumours revealed TERT promoter (56%), CTNNB1 (28%), TP53 (18%) and ACVR2A (10%) as the most frequently mutated genes. ACVR2A mutation rates were higher in NASH-HCC than in other HCC aetiologies (10% vs. 3%, p <0.05). In vitro, ACVR2A silencing prompted a significant increase in cell proliferation in HCC cells. We identified a novel mutational signature (MutSig-NASH-HCC) significantly associated with NASH-HCC (16% vs. 2% in viral/alcohol-HCC, p = 0.03). Tumour mutational burden was higher in non-cirrhotic than in cirrhotic NASH-HCCs (1.45 vs. 0.94 mutations/megabase; p <0.0017). Compared to other aetiologies of HCC, NASH-HCCs were enriched in bile and fatty acid signalling, oxidative stress and inflammation, and presented a higher fraction of Wnt/TGF-β proliferation subclass tumours (42% vs. 26%, p = 0.01) and a lower prevalence of the CTNNB1 subclass. Compared to other aetiologies, NASH-HCC showed a significantly higher prevalence of an immunosuppressive cancer field. In 3 murine models of NASH-HCC, key features of human NASH-HCC were preserved. Conclusions: NASH-HCCs display unique molecular features including higher rates of ACVR2A mutations and the presence of a newly identified mutational signature. Lay summary: The prevalence of hepatocellular carcinoma (HCC) associated with non-alcoholic steatohepatitis (NASH) is increasing globally, but its molecular traits are not well characterised. In this study, we uncovered higher rates of ACVR2A mutations (10%) - a potential tumour suppressor - and the presence of a novel mutational signature that characterises NASH-related HCC
    corecore