44 research outputs found

    An efficient genetic algorithm for structural RNA pairwise alignment and its application to non-coding RNA discovery in yeast

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Aligning RNA sequences with low sequence identity has been a challenging problem since such a computation essentially needs an algorithm with high complexities for taking structural conservation into account. Although many sophisticated algorithms for the purpose have been proposed to date, further improvement in efficiency is necessary to accelerate its large-scale applications including non-coding RNA (ncRNA) discovery.</p> <p>Results</p> <p>We developed a new genetic algorithm, Cofolga2, for simultaneously computing pairwise RNA sequence alignment and consensus folding, and benchmarked it using BRAliBase 2.1. The benchmark results showed that our new algorithm is accurate and efficient in both time and memory usage. Then, combining with the originally trained SVM, we applied the new algorithm to novel ncRNA discovery where we compared <it>S. cerevisiae </it>genome with six related genomes in a pairwise manner. By focusing our search to the relatively short regions (50 bp to 2,000 bp) sandwiched by conserved sequences, we successfully predict 714 intergenic and 1,311 sense or antisense ncRNA candidates, which were found in the pairwise alignments with stable consensus secondary structure and low sequence identity (≤ 50%). By comparing with the previous predictions, we found that > 92% of the candidates is novel candidates. The estimated rate of false positives in the predicted candidates is 51%. Twenty-five percent of the intergenic candidates has supports for expression in cell, i.e. their genomic positions overlap those of the experimentally determined transcripts in literature. By manual inspection of the results, moreover, we obtained four multiple alignments with low sequence identity which reveal consensus structures shared by three species/sequences.</p> <p>Conclusion</p> <p>The present method gives an efficient tool complementary to sequence-alignment-based ncRNA finders.</p

    Multiple small RNAs identified in Mycobacterium bovis BCG are also expressed in Mycobacterium tuberculosis and Mycobacterium smegmatis

    Get PDF
    Tuberculosis (TB) is a major global health problem, infecting millions of people each year. The causative agent of TB, Mycobacterium tuberculosis, is one of the world’s most ancient and successful pathogens. However, until recently, no work on small regulatory RNAs had been performed in this organism. Regulatory RNAs are found in all three domains of life, and have already been shown to regulate virulence in well-known pathogens, such as Staphylococcus aureus and Vibrio cholera. Here we report the discovery of 34 novel small RNAs (sRNAs) in the TB-complex M. bovis BCG, using a combination of experimental and computational approaches. Putative homologues of many of these sRNAs were also identified in M. tuberculosis and/or M. smegmatis. Those sRNAs that are also expressed in the non-pathogenic M. smegmatis could be functioning to regulate conserved cellular functions. In contrast, those sRNAs identified specifically in M. tuberculosis could be functioning in mediation of virulence, thus rendering them potential targets for novel antimycobacterials. Various features and regulatory aspects of some of these sRNAs are discussed

    Drug sensitivity testing on patient-derived sarcoma cells predicts patient response to treatment and identifies c-Sarc inhibitors as active drugs for translocation sarcomas

    Get PDF
    BACKGROUND: Heterogeneity and low incidence comprise the biggest challenge in sarcoma diagnosis and treatment. Chemotherapy, although efficient for some sarcoma subtypes, generally results in poor clinical responses and is mostly recommended for advanced disease. Specific genomic aberrations have been identified in some sarcoma subtypes but few of them can be targeted with approved drugs. METHODS: We cultured and characterised patient-derived sarcoma cells and evaluated their sensitivity to 525 anti-cancer agents including both approved and non-approved drugs. In total, 14 sarcomas and 5 healthy mesenchymal primary cell cultures were studied. The sarcoma biopsies and derived cells were characterised by gene panel sequencing, cancer driver gene expression and by detecting specific fusion oncoproteins in situ in sarcomas with translocations. RESULTS: Soft tissue sarcoma cultures were established from patient biopsies with a success rate of 58%. The genomic profile and drug sensitivity testing on these samples helped to identify targeted inhibitors active on sarcomas. The cSrc inhibitor Dasatinib was identified as an active drug in sarcomas carrying chromosomal translocations. The drug sensitivity of the patient sarcoma cells ex vivo correlated with the response to the former treatment of the patient. CONCLUSIONS: Our results show that patient-derived sarcoma cells cultured in vitro are relevant and practical models for genotypic and phenotypic screens aiming to identify efficient drugs to treat sarcoma patients with poor treatment options.Peer reviewe

    Stable stem enabled Shannon entropies distinguish non-coding RNAs from random backgrounds

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The computational identification of RNAs in genomic sequences requires the identification of signals of RNA sequences. Shannon base pairing entropy is an indicator for RNA secondary structure fold certainty in detection of structural, non-coding RNAs (ncRNAs). Under the Boltzmann ensemble of secondary structures, the probability of a base pair is estimated from its frequency across all the alternative equilibrium structures. However, such an entropy has yet to deliver the desired performance for distinguishing ncRNAs from random sequences. Developing novel methods to improve the entropy measure performance may result in more effective ncRNA gene finding based on structure detection.</p> <p>Results</p> <p>This paper shows that the measuring performance of base pairing entropy can be significantly improved with a constrained secondary structure ensemble in which only canonical base pairs are assumed to occur in energetically stable stems in a fold. This constraint actually reduces the space of the secondary structure and may lower the probabilities of base pairs unfavorable to the native fold. Indeed, base pairing entropies computed with this constrained model demonstrate substantially narrowed gaps of Z-scores between ncRNAs, as well as drastic increases in the Z-score for all 13 tested ncRNA sets, compared to shuffled sequences.</p> <p>Conclusions</p> <p>These results suggest the viability of developing effective structure-based ncRNA gene finding methods by investigating secondary structure ensembles of ncRNAs.</p

    Molecular characterization of hepatocellular carcinoma in patients with nonalcoholic steatohepatitis

    Full text link
    Background and aims: Non-alcoholic steatohepatitis (NASH)-related hepatocellular carcinoma (HCC) is increasing globally, but its molecular features are not well defined. We aimed to identify unique molecular traits characterising NASH-HCC compared to other HCC aetiologies. Methods: We collected 80 NASH-HCC and 125 NASH samples from 5 institutions. Expression array (n = 53 NASH-HCC; n = 74 NASH) and whole exome sequencing (n = 52 NASH-HCC) data were compared to HCCs of other aetiologies (n = 184). Three NASH-HCC mouse models were analysed by RNA-seq/expression-array (n = 20). Activin A receptor type 2A (ACVR2A) was silenced in HCC cells and proliferation assessed by colorimetric and colony formation assays. Results: Mutational profiling of NASH-HCC tumours revealed TERT promoter (56%), CTNNB1 (28%), TP53 (18%) and ACVR2A (10%) as the most frequently mutated genes. ACVR2A mutation rates were higher in NASH-HCC than in other HCC aetiologies (10% vs. 3%, p <0.05). In vitro, ACVR2A silencing prompted a significant increase in cell proliferation in HCC cells. We identified a novel mutational signature (MutSig-NASH-HCC) significantly associated with NASH-HCC (16% vs. 2% in viral/alcohol-HCC, p = 0.03). Tumour mutational burden was higher in non-cirrhotic than in cirrhotic NASH-HCCs (1.45 vs. 0.94 mutations/megabase; p <0.0017). Compared to other aetiologies of HCC, NASH-HCCs were enriched in bile and fatty acid signalling, oxidative stress and inflammation, and presented a higher fraction of Wnt/TGF-β proliferation subclass tumours (42% vs. 26%, p = 0.01) and a lower prevalence of the CTNNB1 subclass. Compared to other aetiologies, NASH-HCC showed a significantly higher prevalence of an immunosuppressive cancer field. In 3 murine models of NASH-HCC, key features of human NASH-HCC were preserved. Conclusions: NASH-HCCs display unique molecular features including higher rates of ACVR2A mutations and the presence of a newly identified mutational signature. Lay summary: The prevalence of hepatocellular carcinoma (HCC) associated with non-alcoholic steatohepatitis (NASH) is increasing globally, but its molecular traits are not well characterised. In this study, we uncovered higher rates of ACVR2A mutations (10%) - a potential tumour suppressor - and the presence of a novel mutational signature that characterises NASH-related HCC

    nocoRNAc: Characterization of non-coding RNAs in prokaryotes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The interest in non-coding RNAs (ncRNAs) constantly rose during the past few years because of the wide spectrum of biological processes in which they are involved. This led to the discovery of numerous ncRNA genes across many species. However, for most organisms the non-coding transcriptome still remains unexplored to a great extent. Various experimental techniques for the identification of ncRNA transcripts are available, but as these methods are costly and time-consuming, there is a need for computational methods that allow the detection of functional RNAs in complete genomes in order to suggest elements for further experiments. Several programs for the genome-wide prediction of functional RNAs have been developed but most of them predict a genomic locus with no indication whether the element is transcribed or not.</p> <p>Results</p> <p>We present <smcaps>NOCO</smcaps>RNAc, a program for the genome-wide prediction of ncRNA transcripts in bacteria. <smcaps>NOCO</smcaps>RNAc incorporates various procedures for the detection of transcriptional features which are then integrated with functional ncRNA loci to determine the transcript coordinates. We applied RNAz and <smcaps>NOCO</smcaps>RNAc to the genome of <it>Streptomyces coelicolor </it>and detected more than 800 putative ncRNA transcripts most of them located antisense to protein-coding regions. Using a custom design microarray we profiled the expression of about 400 of these elements and found more than 300 to be transcribed, 38 of them are predicted novel ncRNA genes in intergenic regions. The expression patterns of many ncRNAs are similarly complex as those of the protein-coding genes, in particular many antisense ncRNAs show a high expression correlation with their protein-coding partner.</p> <p>Conclusions</p> <p>We have developed <smcaps>NOCO</smcaps>RNAc, a framework that facilitates the automated characterization of functional ncRNAs. <smcaps>NOCO</smcaps>RNAc increases the confidence of predicted ncRNA loci, especially if they contain transcribed ncRNAs. <smcaps>NOCO</smcaps>RNAc is not restricted to intergenic regions, but it is applicable to the prediction of ncRNA transcripts in whole microbial genomes. The software as well as a user guide and example data is available at <url>http://www.zbit.uni-tuebingen.de/pas/nocornac.htm</url>.</p

    Evolutionary Modeling and Prediction of Non-Coding RNAs in Drosophila

    Get PDF
    We performed benchmarks of phylogenetic grammar-based ncRNA gene prediction, experimenting with eight different models of structural evolution and two different programs for genome alignment. We evaluated our models using alignments of twelve Drosophila genomes. We find that ncRNA prediction performance can vary greatly between different gene predictors and subfamilies of ncRNA gene. Our estimates for false positive rates are based on simulations which preserve local islands of conservation; using these simulations, we predict a higher rate of false positives than previous computational ncRNA screens have reported. Using one of the tested prediction grammars, we provide an updated set of ncRNA predictions for D. melanogaster and compare them to previously-published predictions and experimental data. Many of our predictions show correlations with protein-coding genes. We found significant depletion of intergenic predictions near the 3′ end of coding regions and furthermore depletion of predictions in the first intron of protein-coding genes. Some of our predictions are colocated with larger putative unannotated genes: for example, 17 of our predictions showing homology to the RFAM family snoR28 appear in a tandem array on the X chromosome; the 4.5 Kbp spanned by the predicted tandem array is contained within a FlyBase-annotated cDNA

    Dinucleotide controlled null models for comparative RNA gene prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comparative prediction of RNA structures can be used to identify functional noncoding RNAs in genomic screens. It was shown recently by Babak <it>et al</it>. [BMC Bioinformatics. 8:33] that RNA gene prediction programs can be biased by the genomic dinucleotide content, in particular those programs using a thermodynamic folding model including stacking energies. As a consequence, there is need for dinucleotide-preserving control strategies to assess the significance of such predictions. While there have been randomization algorithms for single sequences for many years, the problem has remained challenging for multiple alignments and there is currently no algorithm available.</p> <p>Results</p> <p>We present a program called SISSIz that simulates multiple alignments of a given average dinucleotide content. Meeting additional requirements of an accurate null model, the randomized alignments are on average of the same sequence diversity and preserve local conservation and gap patterns. We make use of a phylogenetic substitution model that includes overlapping dependencies and site-specific rates. Using fast heuristics and a distance based approach, a tree is estimated under this model which is used to guide the simulations. The new algorithm is tested on vertebrate genomic alignments and the effect on RNA structure predictions is studied. In addition, we directly combined the new null model with the RNAalifold consensus folding algorithm giving a new variant of a thermodynamic structure based RNA gene finding program that is not biased by the dinucleotide content.</p> <p>Conclusion</p> <p>SISSIz implements an efficient algorithm to randomize multiple alignments preserving dinucleotide content. It can be used to get more accurate estimates of false positive rates of existing programs, to produce negative controls for the training of machine learning based programs, or as standalone RNA gene finding program. Other applications in comparative genomics that require randomization of multiple alignments can be considered.</p> <p>Availability</p> <p>SISSIz is available as open source C code that can be compiled for every major platform and downloaded here: <url>http://sourceforge.net/projects/sissiz</url>.</p

    Discovering cis-Regulatory RNAs in Shewanella Genomes by Support Vector Machines

    Get PDF
    An increasing number of cis-regulatory RNA elements have been found to regulate gene expression post-transcriptionally in various biological processes in bacterial systems. Effective computational tools for large-scale identification of novel regulatory RNAs are strongly desired to facilitate our exploration of gene regulation mechanisms and regulatory networks. We present a new computational program named RSSVM (RNA Sampler+Support Vector Machine), which employs Support Vector Machines (SVMs) for efficient identification of functional RNA motifs from random RNA secondary structures. RSSVM uses a set of distinctive features to represent the common RNA secondary structure and structural alignment predicted by RNA Sampler, a tool for accurate common RNA secondary structure prediction, and is trained with functional RNAs from a variety of bacterial RNA motif/gene families covering a wide range of sequence identities. When tested on a large number of known and random RNA motifs, RSSVM shows a significantly higher sensitivity than other leading RNA identification programs while maintaining the same false positive rate. RSSVM performs particularly well on sets with low sequence identities. The combination of RNA Sampler and RSSVM provides a new, fast, and efficient pipeline for large-scale discovery of regulatory RNA motifs. We applied RSSVM to multiple Shewanella genomes and identified putative regulatory RNA motifs in the 5′ untranslated regions (UTRs) in S. oneidensis, an important bacterial organism with extraordinary respiratory and metal reducing abilities and great potential for bioremediation and alternative energy generation. From 1002 sets of 5′-UTRs of orthologous operons, we identified 166 putative regulatory RNA motifs, including 17 of the 19 known RNA motifs from Rfam, an additional 21 RNA motifs that are supported by literature evidence, 72 RNA motifs overlapping predicted transcription terminators or attenuators, and other candidate regulatory RNA motifs. Our study provides a list of promising novel regulatory RNA motifs potentially involved in post-transcriptional gene regulation. Combined with the previous cis-regulatory DNA motif study in S. oneidensis, this genome-wide discovery of cis-regulatory RNA motifs may offer more comprehensive views of gene regulation at a different level in this organism. The RSSVM software, predictions, and analysis results on Shewanella genomes are available at http://ural.wustl.edu/resources.html#RSSVM
    corecore