85 research outputs found

    Exploring Fold Space Preferences of New-born and Ancient Protein Superfamilies

    Get PDF
    The evolution of proteins is one of the fundamental processes that has delivered the diversity and complexity of life we see around ourselves today. While we tend to define protein evolution in terms of sequence level mutations, insertions and deletions, it is hard to translate these processes to a more complete picture incorporating a polypeptide's structure and function. By considering how protein structures change over time we can gain an entirely new appreciation of their long-term evolutionary dynamics. In this work we seek to identify how populations of proteins at different stages of evolution explore their possible structure space. We use an annotation of superfamily age to this space and explore the relationship between these ages and a diverse set of properties pertaining to a superfamily's sequence, structure and function. We note several marked differences between the populations of newly evolved and ancient structures, such as in their length distributions, secondary structure content and tertiary packing arrangements. In particular, many of these differences suggest a less elaborate structure for newly evolved superfamilies when compared with their ancient counterparts. We show that the structural preferences we report are not a residual effect of a more fundamental relationship with function. Furthermore, we demonstrate the robustness of our results, using significant variation in the algorithm used to estimate the ages. We present these age estimates as a useful tool to analyse protein populations. In particularly, we apply this in a comparison of domains containing greek key or jelly roll motifs

    Introduction to Protein Structure Prediction

    Get PDF
    This chapter gives a graceful introduction to problem of protein three- dimensional structure prediction, and focuses on how to make structural sense out of a single input sequence with unknown structure, the 'query' or 'target' sequence. We give an overview of the different classes of modelling techniques, notably template-based and template free. We also discuss the way in which structural predictions are validated within the global com- munity, and elaborate on the extent to which predicted structures may be trusted and used in practice. Finally we discuss whether the concept of a sin- gle fold pertaining to a protein structure is sustainable given recent insights. In short, we conclude that the general protein three-dimensional structure prediction problem remains unsolved, especially if we desire quantitative predictions. However, if a homologous structural template is available in the PDB model or reasonable to high accuracy may be generated

    The early history and emergence of molecular functions and modular scale-free network behavior

    Get PDF
    The formation of protein structural domains requires that biochemical functions, defined by conserved amino acid sequence motifs, be embedded into a structural scaffold. Here we trace domain history onto a bipartite network of elementary functional loop (EFL) sequences and domain structures defined at the fold superfamily (FSF) level of Structural Classification of Proteins (SCOP). The resulting ‘elementary functionome’ network and its EFL and FSF graph projections unfold evolutionary ‘waterfalls’ describing emergence of primordial functions. Waterfalls reveal how ancient EFLs are shared by FSF structures in two initial waves of functional innovation that involve founder ‘p-loop’ and ‘winged helix’ domain structures. They also uncover a dynamics of modular motif embedding in domain structures that is ongoing, which transfers ‘preferential’ cooption properties of ancient EFLs to emerging FSFs. Remarkably, we find that the emergence of molecular functions induces hierarchical modularity and power law behavior in network evolution as the networks of motifs and structures expand metabolic pathways and translation

    A Conserved Structural Role for the Walker-A Lysine in P-Loop Containing Kinases

    Full text link
    Bacterial tyrosine kinases (BY-kinases) and shikimate kinases (SKs) comprise two structurally divergent P-loop containing enzyme families that share similar catalytic site geometries, most notably with respect to their Walker-A, Walker-B, and DxD motifs. We had previously demonstrated that in BY-kinases, a specific interaction between the Walker-A and Walker-B motifs, driven by the conserved “catalytic” lysine housed on the former, leads to a conformation that is unable to efficiently coordinate Mg2+•ATP and is therefore incapable of chemistry. Here, using enhanced sampling molecular dynamics simulations, we demonstrate that structurally similar interactions between the Walker-A and Walker-B motifs, also mediated by the catalytic lysine, stabilize a state in SKs that deviates significantly from one that is necessary for the optimal coordination of Mg2+•ATP. This structural role of the Walker-A lysine is a general feature in SKs and is found to be present in members that encode a Walker-B sequence characteristic of the family (Coxiella burnetii SK), and in those that do not (Mycobacterium tuberculosis SK). Thus, the structural role of the Walker-A lysine in stabilizing an inactive state, distinct from its catalytic function, is conserved between two distantly related P-loop containing kinase families, the SKs and the BY-kinases. The universal conservation of this element, and of the key characteristics of its associated interaction partners within the Walker motifs of P-loop containing enzymes, suggests that this structural role of the Walker-A lysine is perhaps a widely deployed regulatory mechanism within this ancient family

    Emergent patterns in protein, microbial and mutualistic systems

    Full text link
    Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular. Fecha de lectura: 17-04-2015In this thesis we analyse emergent patterns in complex biological systems. We say that these patterns emerge, given that they result from behaviours of the system that are di cult to explain starting from a microscopic description. These behaviours are strongly dependent on the interactions between elements, and thus our research focuses on the identi cation and evaluation of interaction networks. In particular, we have analysed interactions that may re ect the response of the system to long term conditions, whose analysis may be compatible with an evolutionary interpretation. The methodological and conceptual framework needed for the development of our research is complex. This is the reason why the rst part of the thesis is devoted to clarify the epistemological approximation we have followed. In subsequent chapters, we present our research results, which have been developed around three systems with notable di erences among them. The rst system considers a representative subset of all the protein structures known up to date. We develop a method that objectively demonstrates the existence of structural protein classes known as folds, de ning conserved interaction patterns between amino-acids. We go deeper into the evolutionary interpretation of this result investigating the role of protein function in the structural conservation and divergence. Second, we analyse high-throughput sequencing experiments collecting the presence of bacterial taxa in di erent environments. From this data we infer aggregation and segregation patterns suggesting that bacterial mutualistic interactions are very relevant, and whose functional role is explored in more detail analysing the bacterial assembly process in a group of infants during their development. Last, we have considered mutualistic communities of plants and pollinators. We predict the structural stability of this system de ning two magnitudes: the e ective interspeci c competition and the propagation of perturbations. These magnitudes rationalize the relative e ect of competition versus mutualism and, in particular, of the di erent mutualistic networks in the structural stability, which we show has a main role for sustaining biodiversityEn esta tesis analizamos patrones emergentes en sistemas biológicos complejos. Estos patrones los cali camos como emergentes porque son el resultado de comportamientos del sistema difíciles de caracterizar partiendo de una descripción microscópica. Dichos comportamientos son fuertemente dependientes de las interacciones entre elementos, por lo que nos centramos en la identi cación y evaluación de redes de interacción. En particular, hemos analizado interacciones que esperamos que re ejen la respuesta del sistema a condiciones relevantes en escalas de tiempo largas, cuyo análisis puede ser compatible con una interpretación evolutiva. El marco metodológico y conceptual necesario para el desarrollo de nuestra investigación es complejo. Por ello, la primera parte de la tesis está orientada a clari car la aproximación epistemológica que hemos seguido. En los siguientes capítulos presentamos el resultado de nuestra investigación, desarrollada alrededor de tres sistemas con notables diferencias entre ellos. El primer sistema considera un conjunto representativo de todas las estructuras de proteínas conocidas hasta la fecha. Desarrollamos un método que demuestra objetivamente la existencia de clases estructurales de proteí- nas conocidas como folds, que de nen patrones de interacción entre aminoácidos. Profundizamos en la interpretación evolutiva del resultado investigando el rol de la función de proteínas en la conservación o divergencia estructural. En segunda lugar analizamos experimentos de secuenciación masiva que recogen la presencia de taxones bacterianos en distintos ambientes. De estos datos inferimos patrones de agregación y segregación que sugieren que las interacciones mutualistas entre bacterias son muy relevantes, y cuyo rol funcional es explorado en más detalle analizando el proceso de ensamblaje bacteriano en un grupo de bebés durante su desarrollo. Por último, hemos considerado comunidades mutualistas de plantas y polinizadores. Predecimos la estabilidad estructural de este sistema de niendo dos magnitudes: la competición efectiva interespecí ca y la propagación de las perturbaciones. Estas magnitudes permiten racionalizar el efecto relativo de la competición versus el mutualismo y, en particular, de las distintas redes mutualistas en la estabilidad estructural, cuyo papel mostramos que es esencial en el sostenimiento de la biodiversida

    Understanding the Structural and Functional Importance of Early Folding Residues in Protein Structures

    Get PDF
    Proteins adopt three-dimensional structures which serve as a starting point to understand protein function and their evolutionary ancestry. It is unclear how proteins fold in vivo and how this process can be recreated in silico in order to predict protein structure from sequence. Contact maps are a possibility to describe whether two residues are in spatial proximity and structures can be derived from this simplified representation. Coevolution or supervised machine learning techniques can compute contact maps from sequence: however, these approaches only predict sparse subsets of the actual contact map. It is shown that the composition of these subsets substantially influences the achievable reconstruction quality because most information in a contact map is redundant. No strategy was proposed which identifies unique contacts for which no redundant backup exists. The StructureDistiller algorithm quantifies the structural relevance of individual contacts and identifies crucial contacts in protein structures. It is demonstrated that using this information the reconstruction performance on a sparse subset of a contact map is increased by 0.4 A, which constitutes a substantial performance gain. The set of the most relevant contacts in a map is also more resilient to false positively predicted contacts: up to 6% of false positives are compensated before reconstruction quality matches a naive selection of contacts without any false positive contacts. This information is invaluable for the training to new structure prediction methods and provides insights into how robustness and information content of contact maps can be improved. In literature, the relevance of two types of residues for in vivo folding has been described. Early folding residues initiate the folding process, whereas highly stable residues prevent spontaneous unfolding events. The structural relevance score proposed by this thesis is employed to characterize both types of residues. Early folding residues form pivotal secondary structure elements, but their structural relevance is average. In contrast, highly stable residues exhibit significantly increased structural relevance. This implies that residues crucial for the folding process are not relevant for structural integrity and vice versa. The position of early folding residues is preserved over the course of evolution as demonstrated for two ancient regions shared by all aminoacyl-tRNA synthetases. One arrangement of folding initiation sites resembles an ancient and widely distributed structural packing motif and captures how reverberations of the earliest periods of life can still be observed in contemporary protein structures

    Vliv repertoáru aminokyselin na strukturu a funkci bílkovin

    Get PDF
    Porozumění původu prvotních proteinů je pochopením přechodu komplexních chemických směsí k prvním biologickým systémům. Prvotní proteiny byly pravěpodobně strukturně flexibilní, s promiskuitní aktivitou a se sekvencemi představujícími spíše fyzikálně chemické vlastnosti než definované sekvenční motivy. Rané proteiny byly rovněž pravděpodobně složeny pouze z prebioticky dostupných aminokyselin z endogenních a exogenních zdrojů. V této práci jsme se zaměřili jak na studium vlivu nejpozdějších přírůstků aminokyselinového repertoáru na strukturu a funkci proteinů tak na charakterizaci nahodných sekvencí jakožto prekurzorů pro vznik nejranějších tak i současných proteinů generovaných z původně transkripčně/translačně neaktivních oblasti genomu. Výzkum náhodných proteinů je obzvlášt zajimavý z pohledu neprobádáné strany světa proteinových sekvencí. V této práci jsme charakterizovali in silico soubor náhodných proteinových sekvencí s přirozenými výskyty aminokyselin pomocí predikce sekundárních struktur/proteinové nesupořádánosti/agregace a rovněž jsme vybrali 45 sekvencí pro následující in vitro charakterizaci. Pomocí analýzy in silico knihovny jsme mohli konstatovat, že výskyt sekundárních struktur v náhodném sekvenčním prostoru není výrazně odlišný od toho v přírodních proteinech. Na druhou stranu,...To understand protein structure emergence is to comprehend the evolutionary transition from messy chemistry to the first heritable molecular systems. Early proteins were probably flexible in structure, promiscuous in activity and ambiguous in sequence. Moreover, first sequences were presumably composed of prebiotically plausible amino acids from endogenous and exogenous sources which form only a subset of the extant protein alphabet. Here we investigate the effect of most recent additions to the amino acid alphabet on protein structure/function relationship and the properties of random proteins as the evolutionary point-zero for the earliest sequences as well as for proteins emerging de novo from the non-coding parts of the genome. Random or never born proteins are of a special interest for the contemporary biology as they unveil the unexposed side of the protein sequence space. We constructed an in silico library of random proteins with the natural amino acid alphabet, analyzed its structure/disorder/aggregation content and selected 45 sequences for subsequent experimental preparation and biophysical characterization. We observed that structure content in random sequence space does not differ significantly from the natural proteins. However, the analyses of the aggregation propensity showed a...Department of BiochemistryKatedra biochemieFaculty of SciencePřírodovědecká fakult

    Apoptosis and cell cycle regulation in a basal model system : insights from the placozoan Trichoplax adhaerens

    Get PDF
    Complex gene networks regulated by master control genes are critical for the coordination of fundamental molecular and cellular processes in metazoans. Their misfunction might cause severe diseases like cancer. In this thesis, two evolutionary highly conserved master control genes, namely p53 and Myc, have been comprehensively analyzed in the simple metazoan model system Trichoplax adhaerens (phylum Placozoa). Additionally, the newly described species, Polyplacotoma mediterranea, will provide further promising possibilities for functional and comparative studies. The tumor suppressor p53 is well known for the regulation of programmed cell death (apoptosis), but also controls the cell metabolism and cell cycle. In this work, functional and comparative genetic studies on the p53 homolog tap53 have been performed in T. adhaerens. The in vivo accumulation of tap53 by inhibiting the tap53-taMdm2 interaction as well as the tap53 knockdown resulted in a significant increase of apoptotic cells and an increased mortality. Furthermore, multiple tap53 interaction partners identified by transcriptomic analyses after tap53 knockdown suggest the existence of an alternative and tap53-independent apoptosis signaling pathway. Genes belonging to the innate immune system and the cellular stress response were likewise found differentially expressed in response to tap53 knockdown. These results suggest that tap53 fulfills multiple crucial functions in Trichoplax and is essential for the survival of the organism. As transcription factors, Myc and its most important interaction partner Max regulate cellular processes and mechanisms such as differentiation, proliferation and the cell cycle. The biochemical studies carried out in the course of this thesis on the full-length proteins taMyc and taMax of T. adhaerens provided first insight on their protein structure and interaction. Using different methods, dimerization of full-length proteins was demonstrated which is a prerequisite for functional DNA binding capabilities. This suggests that taMyc and taMax play an essential role as transcription factors in Trichoplax and support an evolutionary conserved function throughout metazoans. This thesis provides new important insights on the function and characteristics of p53 and Myc/Max regulatory networks in a simple metazoan and is the base for future applied medical research on cellular regulation processes

    Investigating tricky nodes in the Tree of Life

    Get PDF
    corecore