Superfamilies are a classification system to combine proteins that are related through a common evolutionary origin, share similar sequences, structures, and core reaction mechanisms, but exert different functions. Today, for most superfamilies tens of thousands of sequences and hundreds of structures are known and most of the different functions of their members have been elucidated. Superfamilies thus provide a formal and biologically sensible framework to study evolutionary relationships between proteins. In the present work, the frameworks of three enzyme superfamilies were utilized to get insights into several important aspects of enzyme evolution.
The first part of this work addresses the question how enzymatic mono- and bi-functionality have evolved in the superfamily of ribose-binding (βα)8-barrel sugar isomerases. This superfamily contains the homologous enzymes HisA and TrpF, which catalyze similar reactions in histidine and tryptophan biosynthesis, as well as the bi-functional enzyme PriA, which catalyzes both the HisA and TrpF isomerization reactions. HisA and TrpF are ubiquitous in Archaea and Bacteria, whereas PriA is only found in certain Actinobacteria. These species have lost the dedicated TrpF enzyme and PriA is consequently part of both tryptophan and histidine biosynthesis. Much has been speculated on the evolutionary relationship of these enzymes and whether the bi-functionality of PriA is a remnant from ancient evolutionary times or a more recent development in Actinobacteria. Using ancestral sequence reconstruction it was demonstrated in this work that evolutionary ancestors of modern HisA enzymes display bi-functionality, reminiscent of PriA. A detailed enzymatic characterization of three reconstructed HisA ancestors showed that they catalyze not only the HisA but also the TrpF reaction with comparable catalytic efficiencies in vitro. Metabolic complementation experiments with hisA and trpF deficient Escherichia coli strains furthermore demonstrated that the bi-functional HisA ancestors could support both histidine and tryptophan biosynthesis in vivo. By a combination of sequence- and network-based in silicomethods, several modern HisA enzymes were subsequently identified that possess sequence motifs typical for bi-functional PriA enzymes. The enzymatic characterization of three such modern HisA representatives revealed that they are also bi-functional, albeit to a lesser extent, although the respective organisms possess dedicated TrpF enzymes. Thus, the ancestral bi-functionality has pertained for billions of years in HisA enzymes, without any obvious selective pressure. Consequently, a new model for the evolution of HisA, TrpF, and PriA was proposed: The bi-functionality of ancient HisA variants may have played an important role in maintaining early metabolism by supporting both histidine and tryptophan biosynthesis. After the emergence of dedicated TrpF enzymes the bi-functionality of the ancestors became expendable and diminished to the level observed in modern HisA enzymes. However, the inherent bi-functionality of HisA contributed to the robustness of microbial metabolism and made possible to compensate the loss of a dedicated trpF gene in some Actinobacteria. In these organisms, the available bi-functionality of HisA was exploited, selected for, and enhanced, which eventually led to the modern PriA enzymes.
The second part of this work deals with the evolution of substrate specificity and secondary metabolic enzymes in a superfamily of chorismate-utilizing enzymes, named MST-superfamily. Chorismate is a central metabolic node molecule and the starting point for the biosynthesis of various important metabolites, including aromatic amino acids, folate, or iron-chelating siderophores. The MST-enzymes catalyze the committed steps of these biosynthetic pathways and are highly similar in sequence, structure, and reaction mechanism. However, the MST-enzymes that are part of primary metabolic pathways employ exclusively ammonia as a nucleophile to aminate chorismate, whereas those that are part of secondary metabolic pathways exclusively employ water as a nucleophile to hydroxylate chorismate. Based on the notion that secondary metabolic enzymes are descendants of primary metabolic ones, it was investigated in this part of this work by which mechanism the transition from primary metabolic to secondary metabolic MSTenzymes went along with a change in nucleophile-specificity from ammonia to water. Initially, network-based, phylogenetic, and structure-based in silicomethods were applied to identify two key amino acids in the nucleophile access channel of the active site that distinguish primary-metabolic/ammonia-utilizing and secondary-metabolic/water-utilizing MST-enzymes. The importance of these key positions was subsequently examined by rationally designing sixteen variants of the MST-enzyme anthranilate synthase, which normally employs ammonia as a nucleophile. The enzymatic characterization of these variants by HPLC-MS showed that the right combination of amino acids at the two key positions indeed resulted in a broadening of nucleophile specificity to also include water. These anthranilate synthase variants hydroxylated chorismate and formed isochorismate with efficiencies comparable to native secondary-metabolic/water-utilizing isochorismate synthases. Moreover, these variants were still able to employ ammonia as a nucleophile and formed their native product anthranilate; hence they were bi-functional. These experiments demonstrated that nucleophile specificity in the MST-superfamily can readily switch from ammonia to water. Moreover, the observed bi-functionality of the anthranilate synthase variants argues that the evolution of secondary metabolic MST-enzymes may have proceeded through bi-functional intermediates. Such metabolic generalists may have allowed for the formation of novel metabolites (isochorismate) while maintaining the formation of important primary metabolic metabolites (anthranilate). This scenario consequently does not a priorirequire gene duplication events and thus precludes negative metabolic effects linked to retaining redundant gene copies.
The third part of this work pursues the question how protein-protein interaction specificity is assured in superfamilies of structurally related protein complexes and how the determinants of interaction specificity have evolved. Specific interactions between proteins are vital for almost all cellular functions. This specificity is usually achieved by shape and electrostatic complementarity of protein interfaces. However, the number of different protein folds and interface geometries found in Nature is limited, due to the constraints imposed by efficiently packing hydrogen-bonded secondary structure elements. It is thus a challenging question how interaction specificity is achieved despite structural limitations and how the formation of non-physiological complexes is avoided when several possible interaction partners with similar interface geometries are available. In order to address this problem, initially a comprehensive computational survey of the interface geometries of over 300 bacterial, heteromeric protein complexes and all their homologs of respective superfamilies was performed. This survey revealed that in about 10% of the superfamilies interface geometries vary significantly between related complexes that share homologous subunits. In these cases interfaces were extended by socalled interface add-ons, which typically comprise 10-20 amino acids, form well-defined secondary structure elements, and significantly contribute to complex stability. These characteristics suggested that interface add-ons differentiate between structurally related protein complexes and contribute to interaction specificity through negative design. In order to back this assumption, the case of the interface add-on found in a superfamily of glutamine amidotransferase complexes involved in tryptophan and folate biosynthesis was subsequently analyzed in detail. These complexes comprise synthase and glutaminase subunits that interact to transfer ammonia from glutamine to an acceptor substrate. A subset of synthase subunits exclusively involved in tryptophan biosynthesis contains the interface add-on, whereas it is absent in all other homologous synthase subunits, including those exclusively involved in folate biosynthesis. The comprehensive experimental characterization of 54 combinations of different synthase and glutaminase subunits by chromatographic methods, light scattering, mass spectrometry, and enzyme kinetics demonstrated that the presence or absence of the interface add-on determines interaction specificity. An in silicogenetic profiling of over 15000 archaeal and bacterial genomes together with in vivogrowth assays showed that the interface add-on found in complexes of tryptophan biosynthesis is biologically relevant for preventing cross-interactions with the homologous complexes of folate biosynthesis, which would lead to harmful metabolic cross-talk that negatively affects cellular fitness. It was finally shown by protein design that the evolution of the interface add-on in these complexes most likely proceeded via intermediary complexes with relaxed interaction specificity. In conclusion, this part of this work demonstrates that interface add-ons are evolutionary tools to facilitate interaction specificity in superfamilies of homologous proteins or in cases where a protein has to discriminate between several potential interaction partners that share similar interface geometries.
In summary, the presented work leads to an improved understanding of the mechanisms behind the evolution of enzymatic mono- and bi-functionality, emphasizes the importance of generalist, bi- or multi-functional enzymes for the evolution of secondary metabolic pathways, and finally describes a so far overlooked structural tool for the evolutionary specification of protein-protein interactions