112 research outputs found

    Alignment of RNA base pairing probability matrices

    Get PDF
    Motivation: Many classes of functional RNA molecules are characterized by highly conserved secondary structures but little detectable sequence similarity. Reliable multiple alignments can therefore be constructed only when the shared structural features are taken into account. Since multiple alignments are used as input for many subsequent methods of data analysis, structure-based alignments are an indispensable necessity in RNA bioinformatics. Results: We present here a method to compute pairwise and progressive multiple alignments from the direct comparison of base pairing probability matrices. Instead of attempting to solve the folding and the alignment problem simultaneously as in the classical Sankoff's algorithm, we use McCaskill's approach to compute base pairing probability matrices which effectively incorporate the information on the energetics of each sequences. A novel, simplified variant of Sankoff's algorithms can then be employed to extract the maximum-weight common secondary structure and an associated alignment

    Strategies for measuring evolutionary conservation of RNA secondary structures

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Evolutionary conservation of RNA secondary structure is a typical feature of many functional non-coding RNAs. Since almost all of the available methods used for prediction and annotation of non-coding RNA genes rely on this evolutionary signature, accurate measures for structural conservation are essential.</p> <p>Results</p> <p>We systematically assessed the ability of various measures to detect conserved RNA structures in multiple sequence alignments. We tested three existing and eight novel strategies that are based on metrics of folding energies, metrics of single optimal structure predictions, and metrics of structure ensembles. We find that the folding energy based SCI score used in the RNAz program and a simple base-pair distance metric are by far the most accurate. The use of more complex metrics like for example tree editing does not improve performance. A variant of the SCI performed particularly well on highly conserved alignments and is thus a viable alternative when only little evolutionary information is available. Surprisingly, ensemble based methods that, in principle, could benefit from the additional information contained in sub-optimal structures, perform particularly poorly. As a general trend, we observed that methods that include a consensus structure prediction outperformed equivalent methods that only consider pairwise comparisons.</p> <p>Conclusion</p> <p>Structural conservation can be measured accurately with relatively simple and intuitive metrics. They have the potential to form the basis of future RNA gene finders, that face new challenges like finding lineage specific structures or detecting mis-aligned sequences.</p

    The Evolving Faces of the SARS-CoV-2 Genome

    Get PDF
    Surveillance of the evolving SARS-CoV-2 genome combined with epidemiological monitoring and emerging vaccination became paramount tasks to control the pandemic which is rapidly changing in time and space. Genomic surveillance must combine generation and sharing sequence data with appropriate bioinformatics monitoring and analysis methods. We applied molecular portrayal using self-organizing maps machine learning (SOM portrayal) to characterize the diversity of the virus genomes, their mutual relatedness and development since the beginning of the pandemic. The genetic landscape obtained visualizes the relevant mutations in a lineage-specific fashion and provides developmental paths in genetic state space from early lineages towards the variants of concern alpha, beta, gamma and delta. The different genes of the virus have specific footprints in the landscape reflecting their biological impact. SOM portrayal provides a novel option for ‘bioinformatics surveillance’ of the pandemic, with strong odds regarding visualization, intuitive perception and ‘personalization’ of the mutational patterns of the virus genomes

    Translational Control by RNA-RNA Interaction: Improved Computation of RNA-RNA Binding Thermodynamics

    Get PDF
    The thermodynamics of RNA-RNA interaction consists of two components: the energy necessary to make a potential binding region accessible, i.e., unpaired, and the energy gained from the base pairing of the two interaction partners. We show here that both components can be efficiently computed using an improved variant of RNAup. The method is then applied to a set of bacterial small RNAs involved in translational control. In all cases of biologically active sRNA target interactions, the target sites predicted by RNAup is in perfect agreement with literature. In addition to prediction of target site location, RNAup can be also be used to determine the mode of sRNA action. Using information about target site location and the accessibility change resulting form sRNA binding we can discriminate between positive and negative regulators of translation

    Changes of bivalent chromatin coincide with increased expression of developmental genes in cancer

    Get PDF
    Bivalent (poised or paused) chromatin comprises activating and repressing histone modifications at the same location. This combination of epigenetic marks at promoter or enhancer regions keeps genes expressed at low levels but poised for rapid activation. Typically, DNA at bivalent promoters is only lowly methylated in normal cells, but frequently shows elevated methylation levels in cancer samples. Here, we developed a universal classifier built from chromatin data that can identify cancer samples solely from hypermethylation of bivalent chromatin. Tested on over 7,000 DNA methylation data sets from several cancer types, it reaches an AUC of 0.92. Although higher levels of DNA methylation are often associated with transcriptional silencing, counter-intuitive positive statistical dependencies between DNA methylation and expression levels have been recently reported for two cancer types. Here, we re-analyze combined expression and DNA methylation data sets, comprising over 5,000 samples, and demonstrate that the conjunction of hypermethylation of bivalent chromatin and up-regulation of the corresponding genes is a general phenomenon in cancer. This up-regulation affects many developmental genes and transcription factors, including dozens of homeobox genes and other genes implicated in cancer. Thus, we reason that the disturbance of bivalent chromatin may be intimately linked to tumorigenesis

    Thermodynamics of RNA-RNA binding

    Get PDF
    Background: Reliable prediction of RNA–RNA binding energies is crucial, e.g. for the understanding on RNAi, microRNA–mRNA binding and antisense interactions. The thermodynamics of such RNA–RNA interactions can be understood as the sum of two energy contributions: (1) the energy necessary to ‘open’ the binding site and (2) the energy gained from hybridization. Methods: We present an extension of the standard partition function approach to RNA secondary structures that computes the probabilities Pu[i, j] that a sequence interval [i, j] is unpaired. Results: Comparison with experimental data shows that Pu[i, j] can be applied as a significant determinant of local target site accessibility for RNA interference (RNAi). Furthermore, these quantities can be used to rigorously determine binding free energies of short oligomers to large mRNA targets. The resource consumption is comparable with a single partition function computation for the large target molecule. We can show that RNAi efficiency correlates well with the binding energies of siRNAs to their respective mRNA target

    Partition function and base pairing probabilities of RNA heterodimers

    Get PDF
    Background: RNA has been recognized as a key player in cellular regulation in recent years. In many cases, non-coding RNAs exert their function by binding to other nucleic acids, as in the case of microRNAs and snoRNAs. The specificity of these interactions derives from the stability of inter-molecular base pairing. The accurate computational treatment of RNA-RNA binding therefore lies at the heart of target prediction algorithms. Methods: The standard dynamic programming algorithms for computing secondary structures of linear single-stranded RNA molecules are extended to the co-folding of two interacting RNAs. Results: We present a program, RNAcofold, that computes the hybridization energy and base pairing pattern of a pair of interacting RNA molecules. In contrast to earlier approaches, complex internal structures in both RNAs are fully taken into account. RNAcofold supports the calculation of the minimum energy structure and of a complete set of suboptimal structures in an energy band above the ground state. Furthermore, it provides an extension of McCaskill's partition function algorithm to compute base pairing probabilities, realistic interaction energies, and equilibrium concentrations of duplex structures

    Variations on RNA folding and alignment: lessons from Benasque

    Get PDF
    Dynamic Programming Algorithms solve many standard problems of RNA bioinformatics in polynomial time. In this contribution we discuss a series of variations on these standard methods that implement refined biophysical models, such as a restriction of RNA folding to canonical structures, and an extension of structural alignments to an explicit scoring of stacking propensities. Furthermore, we demonstrate that a local structural alignment can be employed for ncRNA gene finding. In this context we discuss scanning variants for folding and alignment algorithms

    RNA Accessibility in cubic time

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The accessibility of RNA binding motifs controls the efficacy of many biological processes. Examples are the binding of miRNA, siRNA or bacterial sRNA to their respective targets. Similarly, the accessibility of the Shine-Dalgarno sequence is essential for translation to start in prokaryotes. Furthermore, many classes of RNA binding proteins require the binding site to be single-stranded.</p> <p>Results</p> <p>We introduce a way to compute the accessibility of all intervals within an RNA sequence in <inline-formula><graphic file="1748-7188-6-3-i1.gif"/></inline-formula>(<it>n</it><sup>3</sup>) time. This improves on previous implementations where only intervals of one defined length were computed in the same time. While the algorithm is in the same efficiency class as sampling approaches, the results, especially if the probabilities get small, are much more exact.</p> <p>Conclusions</p> <p>Our algorithm significantly speeds up methods for the prediction of RNA-RNA interactions and other applications that require the accessibility of RNA molecules. The algorithm is already available in the program RNAplfold of the ViennaRNA package.</p

    Differential transcriptional responses to Ebola and Marburg virus infection in bat and human cells

    Get PDF
    The unprecedented outbreak of Ebola in West Africa resulted in over 28,000 cases and 11,000 deaths, underlining the need for a better understanding of the biology of this highly pathogenic virus to develop specific counter strategies. Two filoviruses, the Ebola and Marburg viruses, result in a severe and often fatal infection in humans. However, bats are natural hosts and survive filovirus infections without obvious symptoms. The molecular basis of this striking difference in the response to filovirus infections is not well understood. We report a systematic overview of differentially expressed genes, activity motifs and pathways in human and bat cells infected with the Ebola and Marburg viruses, and we demonstrate that the replication of filoviruses is more rapid in human cells than in bat cells. We also found that the most strongly regulated genes upon filovirus infection are chemokine ligands and transcription factors. We observed a strong induction of the JAK/STAT pathway, of several genes encoding inhibitors of MAP kinases (DUSP genes) and of PPP1R15A, which is involved in ER stress-induced cell death. We used comparative transcriptomics to provide a data resource that can be used to identify cellular responses that might allow bats to survive filovirus infections.Additional co-authors: Andreas J. Gruber, Franziska Hufsky, Henrike Indrischek, Sabina Kanton, Jörg Linde, Nelly Mostajo, Roman Ochsenreiter, Konstantin Riege, Lorena Rivarola-Duarte, Abdullah H. Sahyoun, Sita J. Saunders, Stefan E. Seemann, Andrea Tanzer, Bertram Vogel, Michael T. Wolfinger, Rolf Backofen, Jan Gorodkin, Ivo Grosse, Ivo Hofacker, Steve Hoffmann, Christoph Kaleta, Peter F. Stadler, Stephan Becker, and Manja Marz
    corecore