79 research outputs found

    RNA secondary structure design

    Get PDF
    We consider the inverse-folding problem for RNA secondary structures: for a given (pseudo-knot-free) secondary structure find a sequence that has that structure as its ground state. If such a sequence exists, the structure is called designable. We implemented a branch-and-bound algorithm that is able to do an exhaustive search within the sequence space, i.e., gives an exact answer whether such a sequence exists. The bound required by the branch-and-bound algorithm are calculated by a dynamic programming algorithm. We consider different alphabet sizes and an ensemble of random structures, which we want to design. We find that for two letters almost none of these structures are designable. The designability improves for the three-letter case, but still a significant fraction of structures is undesignable. This changes when we look at the natural four-letter case with two pairs of complementary bases: undesignable structures are the exception, although they still exist. Finally, we also study the relation between designability and the algorithmic complexity of the branch-and-bound algorithm. Within the ensemble of structures, a high average degree of undesignability is correlated to a long time to prove that a given structure is (un-)designable. In the four-letter case, where the designability is high everywhere, the algorithmic complexity is highest in the region of naturally occurring RNA.Comment: 11 pages, 10 figure

    Divergence, Recombination and Retention of Functionality During Protein Evolution

    Get PDF
    We have only a vague idea of precisely how protein sequences evolve in the context of protein structure and function. This is primarily because structural and functional contexts are not easily predictable from the primary sequence, and evaluating patterns of evolution at individual residue positions is also difficult. As a result of increasing biodiversity in genomics studies, progress is being made in detecting context-dependent variation in substitution processes, but it remains unclear exactly what context-dependent patterns we should be looking for. To address this, we have been simulating protein evolution in the context of structure and function using lattice models of proteins and ligands (or substrates). These simulations include thermodynamic features of protein stability and population dynamics. We refer to this approach as \u27ab initio evolution\u27 to emphasise the fact that the equilibrium details of fitness distributions arise from the physical principles of the system and not from any preconceived notions or arbitrary mathematical distributions. Here, we present results on the retention of functionality in homologous recombinants following population divergence. A central result is that protein structure characteristics can strongly influence recombinant functionality. Exceptional structures with many sequence options evolve quickly and tend to retain functionality--even in highly diverged recombinants. By contrast, the more common structures with fewer sequence options evolve more slowly, but the fitness of recombinants drops off rapidly as homologous proteins diverge. These results have implications for understanding viral evolution, speciation and directed evolutionary experiments. Our analysis of the divergence process can also guide improved methods for accurately approximating folding probabilities in more complex but realistic systems

    Frustration in Biomolecules

    Get PDF
    Biomolecules are the prime information processing elements of living matter. Most of these inanimate systems are polymers that compute their structures and dynamics using as input seemingly random character strings of their sequence, following which they coalesce and perform integrated cellular functions. In large computational systems with a finite interaction-codes, the appearance of conflicting goals is inevitable. Simple conflicting forces can lead to quite complex structures and behaviors, leading to the concept of "frustration" in condensed matter. We present here some basic ideas about frustration in biomolecules and how the frustration concept leads to a better appreciation of many aspects of the architecture of biomolecules, and how structure connects to function. These ideas are simultaneously both seductively simple and perilously subtle to grasp completely. The energy landscape theory of protein folding provides a framework for quantifying frustration in large systems and has been implemented at many levels of description. We first review the notion of frustration from the areas of abstract logic and its uses in simple condensed matter systems. We discuss then how the frustration concept applies specifically to heteropolymers, testing folding landscape theory in computer simulations of protein models and in experimentally accessible systems. Studying the aspects of frustration averaged over many proteins provides ways to infer energy functions useful for reliable structure prediction. We discuss how frustration affects folding, how a large part of the biological functions of proteins are related to subtle local frustration effects and how frustration influences the appearance of metastable states, the nature of binding processes, catalysis and allosteric transitions. We hope to illustrate how Frustration is a fundamental concept in relating function to structural biology.Comment: 97 pages, 30 figure

    Error threshold in optimal coding, numerical criteria and classes of universalities for complexity

    Full text link
    The free energy of the Random Energy Model at the transition point between ferromagnetic and spin glass phases is calculated. At this point, equivalent to the decoding error threshold in optimal codes, free energy has finite size corrections proportional to the square root of the number of degrees. The response of the magnetization to the ferromagnetic couplings is maximal at the values of magnetization equal to half. We give several criteria of complexity and define different universality classes. According to our classification, at the lowest class of complexity are random graph, Markov Models and Hidden Markov Models. At the next level is Sherrington-Kirkpatrick spin glass, connected with neuron-network models. On a higher level are critical theories, spin glass phase of Random Energy Model, percolation, self organized criticality (SOC). The top level class involves HOT design, error threshold in optimal coding, language, and, maybe, financial market. Alive systems are also related with the last class. A concept of anti-resonance is suggested for the complex systems.Comment: 17 page

    Hydrophobicity patterns in protein design and differential motif finding in DNA

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Physics, 2004.Includes bibliographical references (p. 115-124).(cont.) is dictated by the solvent accessibility of structures. The distinct intrinsic tendencies of sequence and structure profiles are most pronounced at long periods, where sequence hydrophobicity fluctuates less, while solvent accessibility fluctuates more than average. Correlations between the two profiles can be interpreted as the Boltzmann weight of the solvation energy at room temperature. Chapter 4 shows that correlations in solvent accessibility along protein structures play a key role in the designability phenomenon, for both lattice and natural proteins. Without such correlations, as predicted by the Random Energy Model (REM), all structures will have almost equal values of designability. By using a toy, Ising-based model, we show that changing the correlations moves between a regime with no designability and a regime exhibiting the designability phenomenon, where a few highly designable structures emerge. Understanding how gene expression is regulated is one of the main goals of molecular cell biology. To reach this goal, the recognition and identification of DNA motifs--short patterns in biological sequences--is essential. Common examples of motifs include transcription factor binding sites in promoter regions of co-regulated genes and exonic and intronic splicing enhancers ...In the past decade, a large amount of biological data has been generated, enabling new quantitative approaches in biology. In this thesis, we focus on two biological questions by using techniques from statistical physics: hydrophobicity patterns in proteins and their impact on the designability of protein structures and regulatory motif finding in DNA sequences. Proteins fold into specific structures to perform their functions. Hydrophobicity is the main force of folding; protein sequences try to lower the ground state energy of the folded structure by burying hydrophobic monomers in the core. This results in patterns, or correlations, in the hydrophobic profiles of proteins. In this thesis, we study the designability phenomena: the vast majority of proteins adopt only a small number of distinct folded structures. In Chapter 2, we use principal component analysis to characterize the distribution of solvent accessibility profiles in an appropriate high-dimensional vector space and show that the distribution can be approximated with a Gaussian form. We also show that structures with solvent accessibility profiles dissimilar to the rest are more likely to be highly designable, offering an alternative to existing, computationally-intensive methods for identifying highly-designable structures. In Chapter 3, we extend our method to natural proteins. We use Fourier analysis to study the solvent accessibility and hydrophobicity profiles of natural proteins and show that their distribution can be approximated by a multi-variate Gaussian. The method allows us to separate the intrinsic tendencies of sequence and structure profiles from the interactions that correlate them; we conclude that the alpha-helix periodicity in sequence hydrophobicityby Mehdi Yahyanejad.Ph.D

    Simulating protein evolution via thermodynamic models

    Get PDF
    Natural proteins are results of evolution and they need to maintain certain thermodynamic stabilities in order to carry out their biological functions. By simulating protein evolution based on thermodynamic rules, we could reconstruct the evolution trajectory and analyze the evolutionary dynamics of a protein population, and further understand the protein sequence-structure-function relationship. In this study, we have used both a simplified lattice model and a high-resolution atomic model to simulate protein evolution processes. With the lattice model, we have investigated general theoretical questions about how protein structural designability would affect protein evolution, particularly how it would affect protein recombination and protein-ligand interactions in the evolution process. With the atomic model, we could simulate evolution processes for particular protein with different selection pressure. First, we simulated directed evolution processes and utilized such model to investigate the thermostabilization of T4 lysozyme. Second, we simulated neutral evolution processes for HIV protease, investigated its evolutionary dynamics and the possible drug-resistance mechanism in such neutral evolution. Overall, thermodynamic models can help us understand either general protein evolution dynamics or specific protein sequence-structure-function relationship in evolution

    Divergence, recombination and retention of functionality during protein evolution

    Full text link
    corecore