A dynamic model for the evolution of protein structure

Habib, H; Hawatmeh, Amer; Shamoon, Fayez

research

A dynamic model for the evolution of protein structure

Authors: H Habib
Amer Hawatmeh
Fayez Shamoon
Publication date: 1 April 2017
Publisher
Doi

Abstract

Protein domains are three-dimensional arrangements of atomic structure that are recurrent in the proteomes of organisms. Since the three-dimensional structure of a protein determines its function, it is the fold, much more than the underlying protein sequence and underlying chemistry, that is evolutionarily conserved. We are interested in probing the history of life with these domain structures and glimpsing qualitative changes over time by studying a dynamic model of protein evolution. Using standard phylogenetic methods and a census of protein domain structure in hundreds of genomes, we have reconstructed phylogenetic trees of protein domains, defined using the Structural Classification of Proteins (SCOP), where the nodes are folds or fold superfamilies (FSFs), the character vector for each node is a list of abundances of said fold or FSF across a range of species that spans all three superkingdoms of life, and the character states are linearly polarized by abundance; higher abundance within and among species equates to older structures and determines tree structure. Here we explore at what rate fold or FSF variants and new folds or FSFs appear in evolution. We also explore what collective model of proteome evolution explains such rates. Briefly, what are the dynamics of change? A set of birth-death differential equations was selected to capture the change of interest, with one set for folds and another for FSFs. The models assume that at any given moment there are a certain number of different folds or FSFs, with various abundances, and as each fold or FSF diversifies there are slight changes in the folds or FSFs, producing fold or FSF variants. Eventually as the variants continue to diversify and change as well, a new fold or FSF is born. Thus, there are two rate parameters in each model: the growth rate of fold or FSF variants and the rate of appearance of new folds or FSFs. The model governs the rate change of the average total abundance of a fold or FSF with time. It is fit to the tree so only those fold or FSF transitions actually present in the tree are assumed possible in the equations. It assumes a global perspective: the total abundance of a fold or FSF is that of the fold or FSF across all species, not within one organism. This perspective is used to properly discount terms of horizontal transfer in a birth-death model since such a transfer contributes no new folds or FSFs to the net abundance across all organisms. Our model determines 1) that there is a tight connection between the history of folds and FSFs, 2) that the corresponding transition probabilities to new variants of a fold experienced a sharp increase just as the transition probabilities to new folds experienced a steep decline and 3) that this simultaneous sharp increase and decline is explainable by and consistent with the combinatorial explosion of structural domains, referring to the period of high combination and rearrangement of domains and distribution of these new combinations in novel lineages, and the rise of organismal diversification. Our simulations suggest a picture of the past in which exploration of protein structure space proceeds much like that of a budding field of knowledge: first, coarse grain discoveries are made, followed by fine-grain elaboration of each once the coarse-grain discoveries have been exhausted