18 research outputs found

    System of 7 networks.

    No full text
    <p>Each network is arranged in a separate plane. The core variants (IMP-1, s217, IMP-4, IMP-9, IMP-11, IMP-8, IMP-19) and the variants derived thereof by the mutations V67F or S262G are marked (blue rectangle) and positioned on top of each other.</p

    Networks of the five main branches A-E.

    No full text
    <p>The nodes are labeled by their IMP number (e.g. IMP-1 is labeled by “1”). If no IMP number has been assigned yet, the abbreviated GenBank ID is given (e.g. the GenBank entry 110350569 is labeled by “s11.”). Unknown variants are labeled by “?”.</p

    Protein Variants Form a System of Networks: Microdiversity of IMP Metallo-Beta-Lactamases

    No full text
    <div><p>Genome and metagenome sequencing projects support the view that only a tiny portion of the total protein microdiversity in the biosphere has been sequenced yet, while the vast majority of existing protein variants is still unknown. By using a network approach, the microdiversity of 42 metallo-β-lactamases of the IMP family was investigated. In the networks, the nodes are formed by the variants, while the edges correspond to single mutations between pairs of variants. The 42 variants were assigned to 7 separate networks. By analyzing the networks and their relationships, the structure of sequence space was studied and existing, but still unknown, functional variants were predicted. The largest network consists of 10 variants with IMP-1 in its center and includes two ubiquitous mutations, V67F and S262G. By relating the corresponding pairs of variants, the networks were integrated into a single system of networks. The largest network also included a quartet of variants: IMP-1, two single mutants, and the respective double mutant. The existence of quartets indicates that if two mutations resulted in functional enzymes, the double mutant may also be active and stable. Therefore, quartet construction from triplets was applied to predict 15 functional variants. Further functional mutants were predicted by applying the two ubiquitous mutations in all networks. In addition, since the networks are separated from each other by 10–15 mutations on average, it is expected that a subset of the theoretical intermediates are functional, and therefore are supposed to exist in the biosphere. Finally, the network analysis helps to distinguish between epistatic and additive effects of mutations; while the presence of correlated mutations indicates a strong interdependency between the respective positions, the mutations V67F and S262G are ubiquitous and therefore background independent.</p></div

    Phylogenetic tree of the IMP family of MBLs, colored by time of discovery (blue: before 2000, green: 2000-2005, yellow: 2005–2010, red: after 2010).

    No full text
    <p>Phylogenetic tree of the IMP family of MBLs, colored by time of discovery (blue: before 2000, green: 2000-2005, yellow: 2005–2010, red: after 2010).</p

    Molecular modelling of the mass density of single proteins

    No full text
    <div><p>Using molecular dynamics (MD) simulations, the density of single proteins and its temperature dependence was modelled starting from the experimentally determined protein structure and a generic, transferable force field, without the need of prior parameterization. Although all proteins consist of the same 20 amino acids, their density in aqueous solution varies up to 10% and the thermal expansion coefficient up to twofold. To model the protein density, systematic MD simulations were carried out for 10 proteins with a broad range of densities (1.32–1.43 g/cm<sup>3</sup>) and molecular weights (7–97 kDa). The simulated densities deviated by less than 1.4% from their experimental values that were available for four proteins. Further analyses of protein density showed that it can be essentially described as a consequence of amino acid composition. For five proteins, the density was simulated at different temperatures. The simulated thermal expansion coefficients ranged between 4.3 and 7.1 × 10<sup>−4</sup> K<sup>−1</sup> and were similar to the experimentally determined values of ribonuclease-A and lysozyme (deviations of 2.4 and 14.6%, respectively). Further analyses indicated that the thermal expansion coefficient is linked to the temperature dependence of atomic fluctuations: proteins with a high thermal expansion coefficient show a low increase in flexibility at increasing temperature. A low increase in atomic fluctuations with temperature has been previously described as a possible mechanism of thermostability. Thus, a high thermal expansion coefficient might contribute to protein thermostability.</p> </div

    The scale-free nature of protein sequence space

    No full text
    <div><p>The sequence space of five protein superfamilies was investigated by constructing sequence networks. The nodes represent individual sequences, and two nodes are connected by an edge if the global sequence identity of two sequences exceeds a threshold. The networks were characterized by their degree distribution (number of nodes with a given number of neighbors) and by their fractal network dimension. Although the five protein families differed in sequence length, fold, and domain arrangement, their network properties were similar. The fractal network dimension D<sub>f</sub> was distance-dependent: a high dimension for single and double mutants (D<sub>f</sub> = 4.0), which dropped to D<sub>f</sub> = 0.7–1.0 at 90% sequence identity, and increased to D<sub>f</sub> = 3.5–4.5 below 70% sequence identity. The distance dependency of the network dimension is consistent with evolutionary constraints for functional proteins. While random single and double mutations often result in a functional protein, the accumulation of more than ten mutations is dominated by epistasis. The networks of the five protein families were highly inhomogeneous with few highly connected communities ("hub sequences") and a large number of smaller and less connected communities. The degree distributions followed a power-law distribution with similar scaling exponents close to 1. Because the hub sequences have a large number of functional neighbors, they are expected to be robust toward possible deleterious effects of mutations. Because of their robustness, hub sequences have the potential of high innovability, with additional mutations readily inducing new functions. Therefore, they form hotspots of evolution and are promising candidates as starting points for directed evolution experiments in biotechnology.</p></div

    Percolation in protein sequence space

    No full text
    <div><p>The currently known protein sequences are not distributed equally in sequence space, but cluster into families. Analyzing the cluster size distribution gives a glimpse of the large and unknown extant protein sequence space, which has been explored during evolution. For six protein superfamilies with different fold and function, the cluster size distributions followed a power law with slopes between 2.4 and 3.3, which represent upper limits to the cluster distribution of extant sequences. The power law distribution of cluster sizes is in accordance with percolation theory and strongly supports connectedness of extant sequence space. Percolation of extant sequence space has three major consequences: (1) It transforms our view of sequence space as a highly connected network where each sequence has multiple neighbors, and each pair of sequences is connected by many different paths. A high degree of connectedness is a necessary condition of efficient evolution, because it overcomes the possible blockage by sign epistasis and reciprocal sign epistasis. (2) The Fisher exponent is an indicator of connectedness and saturation of sequence space of each protein superfamily. (3) All clusters are expected to be connected by extant sequences that become apparent as a higher portion of extant sequence space becomes known. Being linked to biochemically distinct homologous families, bridging sequences are promising enzyme candidates for applications in biotechnology because they are expected to have substrate ambiguity or catalytic promiscuity.</p></div

    Overview of the analyzed protein families from Table 1 and their derived parameters.

    No full text
    <p>Overview of the analyzed protein families from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0200815#pone.0200815.t001" target="_blank">Table 1</a> and their derived parameters.</p

    Distributions of pairwise global sequence identity.

    No full text
    <p>Distributions of pairwise global sequence identity for the protein families of α/β-hydrolases (abH), short-chain dehydrogenases/reductases (SDR), ω-transaminases (oTA), cytochrome P450 monooxygenases (CYP), thiamine diphosphate-dependent decarboxylases (DC) and β-hydroxyacid dehydrogenases/imine reductases (bHAD).</p

    Cluster size distributions.

    No full text
    <p>Cluster size distribution of α/β hydrolases (abH), short-chain dehydrogenases/reductases (SDR), ω-transaminases (oTA), cytochrome P450 monooxygenases (CYP), thiamine diphosphate-dependent decarboxylases (DC), and β-hydroxyacid dehydrogenases/imine reductases (bHAD) follow a power law distribution: N(s) ~s<sup>-τ</sup> (N(s), number of clusters of size s; τ, Fisher exponent). Cluster criterion: 60% global sequence identity.</p
    corecore