79 research outputs found
RNA secondary structure design
We consider the inverse-folding problem for RNA secondary structures: for a
given (pseudo-knot-free) secondary structure find a sequence that has that
structure as its ground state. If such a sequence exists, the structure is
called designable. We implemented a branch-and-bound algorithm that is able to
do an exhaustive search within the sequence space, i.e., gives an exact answer
whether such a sequence exists. The bound required by the branch-and-bound
algorithm are calculated by a dynamic programming algorithm. We consider
different alphabet sizes and an ensemble of random structures, which we want to
design. We find that for two letters almost none of these structures are
designable. The designability improves for the three-letter case, but still a
significant fraction of structures is undesignable. This changes when we look
at the natural four-letter case with two pairs of complementary bases:
undesignable structures are the exception, although they still exist. Finally,
we also study the relation between designability and the algorithmic complexity
of the branch-and-bound algorithm. Within the ensemble of structures, a high
average degree of undesignability is correlated to a long time to prove that a
given structure is (un-)designable. In the four-letter case, where the
designability is high everywhere, the algorithmic complexity is highest in the
region of naturally occurring RNA.Comment: 11 pages, 10 figure
Divergence, Recombination and Retention of Functionality During Protein Evolution
We have only a vague idea of precisely how protein sequences evolve in the context of protein structure and function. This is primarily because structural and functional contexts are not easily predictable from the primary sequence, and evaluating patterns of evolution at individual residue positions is also difficult. As a result of increasing biodiversity in genomics studies, progress is being made in detecting context-dependent variation in substitution processes, but it remains unclear exactly what context-dependent patterns we should be looking for. To address this, we have been simulating protein evolution in the context of structure and function using lattice models of proteins and ligands (or substrates). These simulations include thermodynamic features of protein stability and population dynamics. We refer to this approach as \u27ab initio evolution\u27 to emphasise the fact that the equilibrium details of fitness distributions arise from the physical principles of the system and not from any preconceived notions or arbitrary mathematical distributions. Here, we present results on the retention of functionality in homologous recombinants following population divergence. A central result is that protein structure characteristics can strongly influence recombinant functionality. Exceptional structures with many sequence options evolve quickly and tend to retain functionality--even in highly diverged recombinants. By contrast, the more common structures with fewer sequence options evolve more slowly, but the fitness of recombinants drops off rapidly as homologous proteins diverge. These results have implications for understanding viral evolution, speciation and directed evolutionary experiments. Our analysis of the divergence process can also guide improved methods for accurately approximating folding probabilities in more complex but realistic systems
Frustration in Biomolecules
Biomolecules are the prime information processing elements of living matter.
Most of these inanimate systems are polymers that compute their structures and
dynamics using as input seemingly random character strings of their sequence,
following which they coalesce and perform integrated cellular functions. In
large computational systems with a finite interaction-codes, the appearance of
conflicting goals is inevitable. Simple conflicting forces can lead to quite
complex structures and behaviors, leading to the concept of "frustration" in
condensed matter. We present here some basic ideas about frustration in
biomolecules and how the frustration concept leads to a better appreciation of
many aspects of the architecture of biomolecules, and how structure connects to
function. These ideas are simultaneously both seductively simple and perilously
subtle to grasp completely. The energy landscape theory of protein folding
provides a framework for quantifying frustration in large systems and has been
implemented at many levels of description. We first review the notion of
frustration from the areas of abstract logic and its uses in simple condensed
matter systems. We discuss then how the frustration concept applies
specifically to heteropolymers, testing folding landscape theory in computer
simulations of protein models and in experimentally accessible systems.
Studying the aspects of frustration averaged over many proteins provides ways
to infer energy functions useful for reliable structure prediction. We discuss
how frustration affects folding, how a large part of the biological functions
of proteins are related to subtle local frustration effects and how frustration
influences the appearance of metastable states, the nature of binding
processes, catalysis and allosteric transitions. We hope to illustrate how
Frustration is a fundamental concept in relating function to structural
biology.Comment: 97 pages, 30 figure
Error threshold in optimal coding, numerical criteria and classes of universalities for complexity
The free energy of the Random Energy Model at the transition point between
ferromagnetic and spin glass phases is calculated. At this point, equivalent to
the decoding error threshold in optimal codes, free energy has finite size
corrections proportional to the square root of the number of degrees. The
response of the magnetization to the ferromagnetic couplings is maximal at the
values of magnetization equal to half. We give several criteria of complexity
and define different universality classes. According to our classification, at
the lowest class of complexity are random graph, Markov Models and Hidden
Markov Models. At the next level is Sherrington-Kirkpatrick spin glass,
connected with neuron-network models. On a higher level are critical theories,
spin glass phase of Random Energy Model, percolation, self organized
criticality (SOC). The top level class involves HOT design, error threshold in
optimal coding, language, and, maybe, financial market. Alive systems are also
related with the last class. A concept of anti-resonance is suggested for the
complex systems.Comment: 17 page
Hydrophobicity patterns in protein design and differential motif finding in DNA
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Physics, 2004.Includes bibliographical references (p. 115-124).(cont.) is dictated by the solvent accessibility of structures. The distinct intrinsic tendencies of sequence and structure profiles are most pronounced at long periods, where sequence hydrophobicity fluctuates less, while solvent accessibility fluctuates more than average. Correlations between the two profiles can be interpreted as the Boltzmann weight of the solvation energy at room temperature. Chapter 4 shows that correlations in solvent accessibility along protein structures play a key role in the designability phenomenon, for both lattice and natural proteins. Without such correlations, as predicted by the Random Energy Model (REM), all structures will have almost equal values of designability. By using a toy, Ising-based model, we show that changing the correlations moves between a regime with no designability and a regime exhibiting the designability phenomenon, where a few highly designable structures emerge. Understanding how gene expression is regulated is one of the main goals of molecular cell biology. To reach this goal, the recognition and identification of DNA motifs--short patterns in biological sequences--is essential. Common examples of motifs include transcription factor binding sites in promoter regions of co-regulated genes and exonic and intronic splicing enhancers ...In the past decade, a large amount of biological data has been generated, enabling new quantitative approaches in biology. In this thesis, we focus on two biological questions by using techniques from statistical physics: hydrophobicity patterns in proteins and their impact on the designability of protein structures and regulatory motif finding in DNA sequences. Proteins fold into specific structures to perform their functions. Hydrophobicity is the main force of folding; protein sequences try to lower the ground state energy of the folded structure by burying hydrophobic monomers in the core. This results in patterns, or correlations, in the hydrophobic profiles of proteins. In this thesis, we study the designability phenomena: the vast majority of proteins adopt only a small number of distinct folded structures. In Chapter 2, we use principal component analysis to characterize the distribution of solvent accessibility profiles in an appropriate high-dimensional vector space and show that the distribution can be approximated with a Gaussian form. We also show that structures with solvent accessibility profiles dissimilar to the rest are more likely to be highly designable, offering an alternative to existing, computationally-intensive methods for identifying highly-designable structures. In Chapter 3, we extend our method to natural proteins. We use Fourier analysis to study the solvent accessibility and hydrophobicity profiles of natural proteins and show that their distribution can be approximated by a multi-variate Gaussian. The method allows us to separate the intrinsic tendencies of sequence and structure profiles from the interactions that correlate them; we conclude that the alpha-helix periodicity in sequence hydrophobicityby Mehdi Yahyanejad.Ph.D
Simulating protein evolution via thermodynamic models
Natural proteins are results of evolution and they need to maintain certain thermodynamic stabilities in order to carry out their biological functions. By simulating protein evolution based on thermodynamic rules, we could reconstruct the evolution trajectory and analyze the evolutionary dynamics of a protein population, and further understand the protein sequence-structure-function relationship. In this study, we have used both a simplified lattice model and a high-resolution atomic model to simulate protein evolution processes. With the lattice model, we have investigated general theoretical questions about how protein structural designability would affect protein evolution, particularly how it would affect protein recombination and protein-ligand interactions in the evolution process. With the atomic model, we could simulate evolution processes for particular protein with different selection pressure. First, we simulated directed evolution processes and utilized such model to investigate the thermostabilization of T4 lysozyme. Second, we simulated neutral evolution processes for HIV protease, investigated its evolutionary dynamics and the possible drug-resistance mechanism in such neutral evolution. Overall, thermodynamic models can help us understand either general protein evolution dynamics or specific protein sequence-structure-function relationship in evolution
- …