3,095 research outputs found

    Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction

    Get PDF
    Janssen S, Schudoma C, Steger G, Giegerich R. Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction. BMC Bioinformatics. 2011;12(1): 429.BACKGROUND:Many bioinformatics tools for RNA secondary structure analysis are based on a thermodynamic model of RNA folding. They predict a single, "optimal" structure by free energy minimization, they enumerate near-optimal structures, they compute base pair probabilities and dot plots, representative structures of different abstract shapes, or Boltzmann probabilities of structures and shapes. Although all programs refer to the same physical model, they implement it with considerable variation for different tasks, and little is known about the effects of heuristic assumptions and model simplifications used by the programs on the outcome of the analysis.RESULTS:We extract four different models of the thermodynamic folding space which underlie the programs RNAfold, RNAshapes, and RNAsubopt. Their differences lie within the details of the energy model and the granularity of the folding space. We implement probabilistic shape analysis for all models, and introduce the shape probability shift as a robust measure of model similarity. Using four data sets derived from experimentally solved structures, we provide a quantitative evaluation of the model differences.CONCLUSIONS:We find that search space granularity affects the computed shape probabilities less than the over- or underapproximation of free energy by a simplified energy model. Still, the approximations perform similar enough to implementations of the full model to justify their continued use in settings where computational constraints call for simpler algorithms. On the side, we observe that the rarely used level 2 shapes, which predict the complete arrangement of helices, multiloops, internal loops and bulges, include the "true" shape in a rather small number of predicted high probability shapes. This calls for an investigation of new strategies to extract high probability members from the (very large) level 2 shape space of an RNA sequence. We provide implementations of all four models, written in a declarative style that makes them easy to be modified. Based on our study, future work on thermodynamic RNA folding may make a choice of model based on our empirical data. It can take our implementations as a starting point for further program development

    Biological evolution through mutation, selection, and drift: An introductory review

    Full text link
    Motivated by present activities in (statistical) physics directed towards biological evolution, we review the interplay of three evolutionary forces: mutation, selection, and genetic drift. The review addresses itself to physicists and intends to bridge the gap between the biological and the physical literature. We first clarify the terminology and recapitulate the basic models of population genetics, which describe the evolution of the composition of a population under the joint action of the various evolutionary forces. Building on these foundations, we specify the ingredients explicitly, namely, the various mutation models and fitness landscapes. We then review recent developments concerning models of mutational degradation. These predict upper limits for the mutation rate above which mutation can no longer be controlled by selection, the most important phenomena being error thresholds, Muller's ratchet, and mutational meltdowns. Error thresholds are deterministic phenomena, whereas Muller's ratchet requires the stochastic component brought about by finite population size. Mutational meltdowns additionally rely on an explicit model of population dynamics, and describe the extinction of populations. Special emphasis is put on the mutual relationship between these phenomena. Finally, a few connections with the process of molecular evolution are established.Comment: 62 pages, 6 figures, many reference

    Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics

    Get PDF
    BACKGROUND: The general problem of RNA secondary structure prediction under the widely used thermodynamic model is known to be NP-complete when the structures considered include arbitrary pseudoknots. For restricted classes of pseudoknots, several polynomial time algorithms have been designed, where the O(n(6))time and O(n(4)) space algorithm by Rivas and Eddy is currently the best available program. RESULTS: We introduce the class of canonical simple recursive pseudoknots and present an algorithm that requires O(n(4)) time and O(n(2)) space to predict the energetically optimal structure of an RNA sequence, possible containing such pseudoknots. Evaluation against a large collection of known pseudoknotted structures shows the adequacy of the canonization approach and our algorithm. CONCLUSIONS: RNA pseudoknots of medium size can now be predicted reliably as well as efficiently by the new algorithm

    Single-molecule experiments in biological physics: methods and applications

    Full text link
    I review single-molecule experiments (SME) in biological physics. Recent technological developments have provided the tools to design and build scientific instruments of high enough sensitivity and precision to manipulate and visualize individual molecules and measure microscopic forces. Using SME it is possible to: manipulate molecules one at a time and measure distributions describing molecular properties; characterize the kinetics of biomolecular reactions and; detect molecular intermediates. SME provide the additional information about thermodynamics and kinetics of biomolecular processes. This complements information obtained in traditional bulk assays. In SME it is also possible to measure small energies and detect large Brownian deviations in biomolecular reactions, thereby offering new methods and systems to scrutinize the basic foundations of statistical mechanics. This review is written at a very introductory level emphasizing the importance of SME to scientists interested in knowing the common playground of ideas and the interdisciplinary topics accessible by these techniques. The review discusses SME from an experimental perspective, first exposing the most common experimental methodologies and later presenting various molecular systems where such techniques have been applied. I briefly discuss experimental techniques such as atomic-force microscopy (AFM), laser optical tweezers (LOT), magnetic tweezers (MT), biomembrane force probe (BFP) and single-molecule fluorescence (SMF). I then present several applications of SME to the study of nucleic acids (DNA, RNA and DNA condensation), proteins (protein-protein interactions, protein folding and molecular motors). Finally, I discuss applications of SME to the study of the nonequilibrium thermodynamics of small systems and the experimental verification of fluctuation theorems. I conclude with a discussion of open questions and future perspectives.Comment: Latex, 60 pages, 12 figures, Topical Review for J. Phys. C (Cond. Matt

    Compiling a domain specific language for dynamic programming

    Get PDF
    Steffen P. Compiling a domain specific language for dynamic programming. Bielefeld (Germany): Bielefeld University; 2006

    Learning the Regulatory Code of Gene Expression

    Get PDF
    Data-driven machine learning is the method of choice for predicting molecular phenotypes from nucleotide sequence, modeling gene expression events including protein-DNA binding, chromatin states as well as mRNA and protein levels. Deep neural networks automatically learn informative sequence representations and interpreting them enables us to improve our understanding of the regulatory code governing gene expression. Here, we review the latest developments that apply shallow or deep learning to quantify molecular phenotypes and decode the cis-regulatory grammar from prokaryotic and eukaryotic sequencing data. Our approach is to build from the ground up, first focusing on the initiating protein-DNA interactions, then specific coding and non-coding regions, and finally on advances that combine multiple parts of the gene and mRNA regulatory structures, achieving unprecedented performance. We thus provide a quantitative view of gene expression regulation from nucleotide sequence, concluding with an information-centric overview of the central dogma of molecular biology

    Mapping the proteome with data-driven methods: A cycle of measurement, modeling, hypothesis generation, and engineering

    Get PDF
    The living cell exhibits emergence of complex behavior and its modeling requires a systemic, integrative approach if we are to thoroughly understand and harness it. The work in this thesis has had the more narrow aim of quantitatively characterizing and mapping the proteome using data-driven methods, as proteins perform most functional and structural roles within the cell. Covered are the different parts of the cycle from improving quantification methods, to deriving protein features relying on their primary structure, predicting the protein content solely from sequence data, and, finally, to developing theoretical protein engineering tools, leading back to experiment.\ua0\ua0\ua0\ua0 High-throughput mass spectrometry platforms provide detailed snapshots of a cell\u27s protein content, which can be mined towards understanding how the phenotype arises from genotype and the interplay between the various properties of the constituent proteins. However, these large and dense data present an increased analysis challenge and current methods capture only a small fraction of signal. The first part of my work has involved tackling these issues with the implementation of a GPU-accelerated and distributed signal decomposition pipeline, making factorization of large proteomics scans feasible and efficient. The pipeline yields individual analyte signals spanning the majority of acquired signal, enabling high precision quantification and further analytical tasks.\ua0\ua0\ua0 Having such detailed snapshots of the proteome enables a multitude of undertakings. One application has been to use a deep neural network model to learn the amino acid sequence determinants of temperature adaptation, in the form of reusable deep model features. More generally, systemic quantities may be predicted from the information encoded in sequence by evolutionary pressure. Two studies taking inspiration from natural language processing have sought to learn the grammars behind the languages of expression, in one case predicting mRNA levels from DNA sequence, and in the other protein abundance from amino acid sequence. These two models helped build a quantitative understanding of the central dogma and, furthermore, in combination yielded an improved predictor of protein amount. Finally, a mathematical framework relying on the embedded space of a deep model has been constructed to assist guided mutation of proteins towards optimizing their abundance
    • …
    corecore