9 research outputs found

    The scale-free nature of protein sequence space

    No full text
    <div><p>The sequence space of five protein superfamilies was investigated by constructing sequence networks. The nodes represent individual sequences, and two nodes are connected by an edge if the global sequence identity of two sequences exceeds a threshold. The networks were characterized by their degree distribution (number of nodes with a given number of neighbors) and by their fractal network dimension. Although the five protein families differed in sequence length, fold, and domain arrangement, their network properties were similar. The fractal network dimension D<sub>f</sub> was distance-dependent: a high dimension for single and double mutants (D<sub>f</sub> = 4.0), which dropped to D<sub>f</sub> = 0.7–1.0 at 90% sequence identity, and increased to D<sub>f</sub> = 3.5–4.5 below 70% sequence identity. The distance dependency of the network dimension is consistent with evolutionary constraints for functional proteins. While random single and double mutations often result in a functional protein, the accumulation of more than ten mutations is dominated by epistasis. The networks of the five protein families were highly inhomogeneous with few highly connected communities ("hub sequences") and a large number of smaller and less connected communities. The degree distributions followed a power-law distribution with similar scaling exponents close to 1. Because the hub sequences have a large number of functional neighbors, they are expected to be robust toward possible deleterious effects of mutations. Because of their robustness, hub sequences have the potential of high innovability, with additional mutations readily inducing new functions. Therefore, they form hotspots of evolution and are promising candidates as starting points for directed evolution experiments in biotechnology.</p></div

    Overview of the analyzed protein family networks by number of nodes (sequences) and maximal degree (number of neighbors) for a 95% sequence identity threshold, with average sequence length.

    No full text
    <p>Overview of the analyzed protein family networks by number of nodes (sequences) and maximal degree (number of neighbors) for a 95% sequence identity threshold, with average sequence length.</p

    Overview of the analyzed protein families from Table 1 and their derived parameters.

    No full text
    <p>Overview of the analyzed protein families from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0200815#pone.0200815.t001" target="_blank">Table 1</a> and their derived parameters.</p

    Neighbor distribution for the protein families with low microdiversity from Table 1 with neighbors defined by ≥95% global sequence identity.

    No full text
    <p>The corresponding scale-free exponents γ were derived from linear regression for degrees ≤ 50 (bHAD, DC) or ≤ 70 (oTA, SDR) and are summarized in <b><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0200815#pone.0200815.t002" target="_blank">Table 2</a></b>.</p

    Exemplary network hubs and their annotations from sequence networks with a threshold of 95% sequence identity (99.5% for TEM β-lactamases)<sup>a</sup> for the protein families from Table 1.

    No full text
    <p>Exemplary network hubs and their annotations from sequence networks with a threshold of 95% sequence identity (99.5% for TEM β-lactamases)<sup>a</sup> for the protein families from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0200815#pone.0200815.t001" target="_blank">Table 1</a>.</p

    Distributions of pairwise global sequence identity for the protein families from Table 1 as determined by high-scoring sequence pairs in USEARCH (20).

    No full text
    <p>Distributions of pairwise global sequence identity for the protein families from <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0200815#pone.0200815.t001" target="_blank">Table 1</a> as determined by high-scoring sequence pairs in USEARCH (20).</p
    corecore