156 research outputs found

    Fold Designability, Distribution, and Disease

    Get PDF
    Fold designability has been estimated by the number of families contained in that fold. Here, we show that among orthologous proteins, sequence divergence is higher for folds with greater numbers of families. Folds with greater numbers of families also tend to have families that appear more often in the proteome and greater promiscuity (the number of unique “partner” folds that the fold is found with within the same protein). We also find that many disease-related proteins have folds with relatively few families. In particular, a number of these proteins are associated with diseases occurring at high frequency. These results suggest that family counts reflect how certain structures are distributed in nature and is an important characteristic associated with many human diseases

    Divergence, Recombination and Retention of Functionality During Protein Evolution

    Get PDF
    We have only a vague idea of precisely how protein sequences evolve in the context of protein structure and function. This is primarily because structural and functional contexts are not easily predictable from the primary sequence, and evaluating patterns of evolution at individual residue positions is also difficult. As a result of increasing biodiversity in genomics studies, progress is being made in detecting context-dependent variation in substitution processes, but it remains unclear exactly what context-dependent patterns we should be looking for. To address this, we have been simulating protein evolution in the context of structure and function using lattice models of proteins and ligands (or substrates). These simulations include thermodynamic features of protein stability and population dynamics. We refer to this approach as \u27ab initio evolution\u27 to emphasise the fact that the equilibrium details of fitness distributions arise from the physical principles of the system and not from any preconceived notions or arbitrary mathematical distributions. Here, we present results on the retention of functionality in homologous recombinants following population divergence. A central result is that protein structure characteristics can strongly influence recombinant functionality. Exceptional structures with many sequence options evolve quickly and tend to retain functionality--even in highly diverged recombinants. By contrast, the more common structures with fewer sequence options evolve more slowly, but the fitness of recombinants drops off rapidly as homologous proteins diverge. These results have implications for understanding viral evolution, speciation and directed evolutionary experiments. Our analysis of the divergence process can also guide improved methods for accurately approximating folding probabilities in more complex but realistic systems

    Protein structure generation via folding diffusion

    Full text link
    The ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases. Despite recent advances in protein structure prediction, directly generating diverse, novel protein structures from neural networks remains difficult. In this work, we present a new diffusion-based generative model that designs protein backbone structures via a procedure that mirrors the native folding process. We describe protein backbone structure as a series of consecutive angles capturing the relative orientation of the constituent amino acid residues, and generate new structures by denoising from a random, unfolded state towards a stable folded structure. Not only does this mirror how proteins biologically twist into energetically favorable conformations, the inherent shift and rotational invariance of this representation crucially alleviates the need for complex equivariant networks. We train a denoising diffusion probabilistic model with a simple transformer backbone and demonstrate that our resulting model unconditionally generates highly realistic protein structures with complexity and structural patterns akin to those of naturally-occurring proteins. As a useful resource, we release the first open-source codebase and trained models for protein structure diffusion

    Simulating protein evolution via thermodynamic models

    Get PDF
    Natural proteins are results of evolution and they need to maintain certain thermodynamic stabilities in order to carry out their biological functions. By simulating protein evolution based on thermodynamic rules, we could reconstruct the evolution trajectory and analyze the evolutionary dynamics of a protein population, and further understand the protein sequence-structure-function relationship. In this study, we have used both a simplified lattice model and a high-resolution atomic model to simulate protein evolution processes. With the lattice model, we have investigated general theoretical questions about how protein structural designability would affect protein evolution, particularly how it would affect protein recombination and protein-ligand interactions in the evolution process. With the atomic model, we could simulate evolution processes for particular protein with different selection pressure. First, we simulated directed evolution processes and utilized such model to investigate the thermostabilization of T4 lysozyme. Second, we simulated neutral evolution processes for HIV protease, investigated its evolutionary dynamics and the possible drug-resistance mechanism in such neutral evolution. Overall, thermodynamic models can help us understand either general protein evolution dynamics or specific protein sequence-structure-function relationship in evolution

    Structure and Age Jointly Influence Rates of Protein Evolution

    Get PDF
    What factors determine a protein's rate of evolution are actively debated. Especially unclear is the relative role of intrinsic factors of present-day proteins versus historical factors such as protein age. Here we study the interplay of structural properties and evolutionary age, as determinants of protein evolutionary rate. We use a large set of one-to-one orthologs between human and mouse proteins, with mapped PDB structures. We report that previously observed structural correlations also hold within each age group – including relationships between solvent accessibility, designabililty, and evolutionary rates. However, age also plays a crucial role: age modulates the relationship between solvent accessibility and rate. Additionally, younger proteins, despite being less designable, tend to evolve faster than older proteins. We show that previously reported relationships between age and rate cannot be explained by structural biases among age groups. Finally, we introduce a knowledge-based potential function to study the stability of proteins through large-scale computation. We find that older proteins are more stable for their native structure, and more robust to mutations, than younger ones. Our results underscore that several determinants, both intrinsic and historical, can interact to determine rates of protein evolution

    Evolution of structure and function in Phenylalanine Hydroxylase. With the regulatory properties in sight

    Get PDF
    In the post-genomic era, an idea of how similar the genomes of different species actually are is on the horizon. Less than 10 years ago, the human genome was estimated to encode 100000 genes. That was an overestimation, as the real number of human genes is 20000-25000. Most genes are expressed as proteins. The 3D structure of a protein is more conserved than its sequence, and therefore the structural context of protein and gene evolution must not be forgotten. By its structure, the protein can propagate its function. In the early 90’s the estimated number of different protein structure classes, so called folds, was predicted to be about 10000. Today there are slightly above 1000 folds and the discovery of new folds has leveled off, despite an increase in the number of protein structures that have been solved over the last few years. Indeed, some folds are used for more than one function, and found in various functional contexts. Then, if the many components are so similar, how is the biological species divergence from same component genomes achieved? One way to study biological diversity is by dividing it into its smaller components, e.g. by studying protein or gene family evolution. Here the evolution and regulation of the aromatic amino acid hydroxylase (AAAHs) have been under examination. This gene family encodes the proteins phenylalanine hydroxylase (PAH), tyrosine hydroxylase (TH), and tryptophan hydroxylase (TPH). These enzymes are highly physiologically important. PAH, expressed in liver, regulates the homeostasis of L-Phe by hydroxylating it into L-Tyr. TH, expressed in the central nervous system, hydroxylates L-Tyr into L-Dopa. L-Dopa is part of two important pathways i) melanogenesis and ii) dopamine production. In humans, dysfunctions in PAH that cause elevated L-Phe concentration can result in phenylketonuria (PKU). Untreated PKU results in neurological damage. TPH produces a precursor of serotonin from LTrp. The end products of these enzymes are neurotransmitters and hormones with increasingly important functions, from e.g. amoeba to nematode to man. As PAH has evolved in mammals its regulation has become increasingly sophisticated, e.g. homotropic positive cooperativity that shifts the conformational equilibrium from dimeric to tetrameric is seen in the mammalian lineage. Nematode PAH is devoid of positive cooperativity, but resembles the tetrameric high-affinity and high-activity mammalian PAH. TH and TPH are always tetrameric and not allosterically regulated. Each AAAH subunit has a regulatory domain, a catalytic domain, and an oligomerization domain. The promotion of positive cooperativity in PAH has been investigated by comparing mammalian PAH to nematode PAH. The low-affinity and low-activity dimer as well as the high-affinity and high-activity tetramer of PAH were modeled. Sequence analysis on a nematode sequence cluster and a mammalian sequence cluster identified sites with high probability of being involved in functional divergence, e.g. change in regulation. Residue specific electrostatic interaction energies were calculated for all ionizible residues in the models. In general, we note important differences in the substrate binding pocket that aids to explain why the active site in nematode PAH is less dynamic than in mammalian PAH. Our results suggest a pathway for the positive cooperativity from one active site to another, involving various predicted hinge regions from human PAH, where we find the nematode PAH more rigid. The regulatory domain in PAH is part of the ACT domain family. The ACT domains are frequently found regulating metabolic enzymes in an allosteric manner. The allosteric effector is often an amino acid that binds to an interface formed by two ACT domains. No contacts are formed between two ACT domains and the stoichiometry of binding is 1:1 for L-Phe in PAH. Therefore the allosteric effect must originate in the active site when the substrate binds. An alternative pathway for aromatic amino acid biosynthesis is present in e.g. plants and bacteria. This pathway has an L-Phe binding ACT domain, which is homologous to the ACT domain in AAAH. The L-Phe binding motif in this domain is also conserved in PAH. A comparative structural analysis of this area shows why L-Phe may not bind in the AAAH regulatory domain and also indicates why it has remained. The ACT domain has an abundant fold, a superfold. A structural approach was used to identify more potential ACT domains to gain further insights to the functional properties that this domain could perform in general, and in PAH in particular. Here we note e.g. two interesting potential domain families that could be homologous to the ACT domain, namely the GlnB-like domains and heavy metal binding domains. The phylogeny of the AAAH family has not been resolved earlier given the lack of a suitable outgroup. As more genome sequences became available, we identified an outgroup candidate and had it experimentally characterized. The phylogeny was resolved, the ancestral function determined, and by comparing the chromosomal gene locations the order of events in AAAH evolution was envisioned

    Computational design and designability of gene regulatory networks

    Full text link
    Nuestro conocimiento de las interacciones moleculares nos ha conducido hoy hacia una perspectiva ingenieril, donde diseños e implementaciones de sistemas artificiales de regulación intentan proporcionar instrucciones fundamentales para la reprogramación celular. Nosotros aquí abordamos el diseño de redes de genes como una forma de profundizar en la comprensión de las regulaciones naturales. También abordamos el problema de la diseñabilidad dada una genoteca de elementos compatibles. Con este fin, aplicamos métodos heuríticos de optimización que implementan rutinas para resolver problemas inversos, así como herramientas de análisis matemático para estudiar la dinámica de la expresión genética. Debido a que la ingeniería de redes de transcripción se ha basado principalmente en el ensamblaje de unos pocos elementos regulatorios usando principios de diseño racional, desarrollamos un marco de diseño computacional para explotar este enfoque. Modelos asociados a genotecas fueron examinados para descubrir el espacio genotípico asociado a un cierto fenotipo. Además, desarrollamos un procedimiento completamente automatizado para diseñar moleculas de ARN no codificante con capacidad regulatoria, basándonos en un modelo fisicoquímico y aprovechando la regulación alostérica. Los circuitos de ARN resultantes implementaban un mecanismo de control post-transcripcional para la expresión de proteínas que podía ser combinado con elementos transcripcionales. También aplicamos los métodos heurísticos para analizar la diseñabilidad de rutas metabólicas. Ciertamente, los métodos de diseño computacional pueden al mismo tiempo aprender de los mecanismos naturales con el fin de explotar sus principios fundamentales. Así, los estudios de estos sistemas nos permiten profundizar en la ingeniería genética. De relevancia, el control integral y las regulaciones incoherentes son estrategias generales que los organismos emplean y que aquí analizamos.Rodrigo Tarrega, G. (2011). Computational design and designability of gene regulatory networks [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1417

    A genetic association analysis of cognitive ability and cognitive ageing using 325 markers for 109 genes associated with oxidative stress or cognition

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Non-pathological cognitive ageing is a distressing condition affecting an increasing number of people in our 'ageing society'. Oxidative stress is hypothesised to have a major role in cellular ageing, including brain ageing.</p> <p>Results</p> <p>Associations between cognitive ageing and 325 single nucleotide polymorphisms (SNPs), located in 109 genes implicated in oxidative stress and/or cognition, were examined in a unique cohort of relatively healthy older people, on whom we have cognitive ability scores at ages 11 and 79 years (LBC1921). SNPs showing a significant positive association were then genotyped in a second cohort for whom we have cognitive ability scores at the ages of 11 and 64 years (ABC1936). An intronic SNP in the <it>APP </it>gene (rs2830102) was significantly associated with cognitive ageing in both LBC1921 and a combined LBC1921/ABC1936 analysis (<it>p </it>< 0.01), but not in ABC1936 alone.</p> <p>Conclusion</p> <p>This study suggests a possible role for APP in normal cognitive ageing, in addition to its role in Alzheimer's disease.</p
    corecore