8 research outputs found

    Correlation of Features with Principal Components.

    No full text
    <p>Loading plots of the eigenvector coefficients of each feature analyzed by PCA show the influence and correlations of each variable to the principal components. Eight features were analyzed to identify the set of features that could represent āˆ¼80% of data variation in the first two principal components (see text for feature descriptions). (a) 80.3% of the total variance of all eight features could be accounted for with just the first two PCs, though R2_Ī”Ī”G (red) had demonstrably smaller coefficients. (b) Exclusion of R2_Ī”Ī”G produced a PCA over 7 features whose PC1 and PC2 accounted for 87.9% of the variance. (c) After removal of 49 interfaces predicted to be FLIP in the first PCA, a second round of PCA using the same seven features but with only data for the remaining 110 protein interfaces was calculated. This PCA produced eigenvectors that had 84.2% of the variance in the first two PCs. [<i>Figure generated using JMP </i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0097115#pone.0097115-Chakrabarti1" target="_blank">[<i>46</i>]</a><i> and Microsoft Excel, 2008</i>].</p

    Distribution of alanine substitution energies in FLIP and FunC interfaces.

    No full text
    <p>(a) and (b) show a histogrammed contour plot colored blue-to-red of the Ī”Ī”Gala of substitution to alanine of interfacial residues (blue: more favorable values, red: more disruptive values). The plot axes are the first two principal components of the geometric distribution of alanine CĪ± positions. PCA was used to align the interface along the X- and Y-axes. Axes are normalized. (a) Ī”Ī”Gala of the FunC interface from PDBid: 1c02, chains A&B. (b) Ī”Ī”Gala of the FLIP interface from PDBid: 1b5e_AB, chains A&B. (c) Linear regressions of Ī”Ī”Gala vs. Distance from interface center. Regressions for the interfaces in the FLIPdb training set with the 10 most positive [1acy_HP, 1biq_AB, 2cii_AC, 1b5e_AB, 1edh_AB, 1pky_BD, 1tx4_AB, 1hjc_AC, x1bsf8_AJ, 1bo5_OZ] and 10 most negative [1tzi_AV, 1acy_LP, x1ppf2_EZ, x1dv82_AC, x1wtl_BZ, x1py94_AE, x1erv2_AC, x1gaf2_LY, 1scu_BD, 1c02_AB] intercepts. FLIP are shown in green and blue [1tzi_AV, 1acy_LP]. FunC are shown in red and yellow [x1bsf8_AJ, 1bo5_OZ]. Ī”Ī”Gala are normalized to MAX(ABS(Ī”Ī”Gala)), while distances of each residue's CĪ± from the mean of the CĪ± positions (Center of Interface) are normalized to MAX(distance). All 3 plots generally show that FLIP interfaces are more centralized and radially symmetric than FunC interfaces. 80% of shown positive intercepts are FLIP and 80% of shown negative intercepts are FunC. [<i>Figures (a,b) generated using JMP </i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0097115#pone.0097115-Chakrabarti1" target="_blank">[<i>46</i>]</a><i>. Figure (c) generated using Microsoft Excel, 2008</i>]</p

    Protein-Protein Interface Detection Using the Energy Centrality Relationship (ECR) Characteristic of Proteins

    No full text
    <div><p>Specific protein interactions are responsible for most biological functions. Distinguishing Functionally Linked Interfaces of Proteins (FLIPs), from Functionally uncorrelated Contacts (FunCs), is therefore important to characterizing these interactions. To achieve this goal, we have created a database of protein structures called FLIPdb, containing proteins belonging to various functional sub-categories. Here, we use geometric features coupled with Kortemme and Baker's computational alanine scanning method to calculate the energetic sensitivity of each amino acid at the interface to substitution, identify hotspots, and identify other factors that may contribute towards an interface being FLIP or FunC. Using Principal Component Analysis and K-means clustering on a training set of 160 interfaces, we could distinguish FLIPs from FunCs with an accuracy of 76%. When these methods were applied to two test sets of 18 and 170 interfaces, we achieved similar accuracies of 78% and 80%. We have identified that FLIP interfaces have a stronger central organizing tendency than FunCs, due, we suggest, to greater specificity. We also observe that certain functional sub-categories, such as enzymes, antibody-heavy-light, antibody-antigen, and enzyme-inhibitors form distinct sub-clusters. The antibody-antigen and enzyme-inhibitors interfaces have patterns of physical characteristics similar to those of FunCs, which is in agreement with the fact that the selection pressures of these interfaces is differently evolutionarily driven. As such, our ECR model also successfully describes the impact of evolution and natural selection on protein-protein interfaces. Finally, we indicate how our ECR method may be of use in reducing the false positive rate of docking calculations.</p></div

    The Energy Centrality Relationship (ECR) for interface evolution.

    No full text
    <p>The ECR hypothesis is that upon initial fortuitous protein-protein association, residues in a nascent interface have a selective pressure to maintain or improve the affinity arising from the initial contact, while simultaneously having a similar pressure on residues surrounding that contact. (a) and (b) show a conceptual PPI that has a radially symmetric distribution of ā€˜hotā€™ (energetically favorable, red) and ā€˜coldā€™ (energetically unfavorable, blue) residues in a FLIP, while (c) and (d) are example residue energy distributions of weaker (c) and stronger (d) affinity FunC. Over evolutionary time (cā€“f), selective activity, affinity, and specificity pressures on residues in a FunC produce a radially symmetric pattern in the energetics of the interface. The resulting interface should demonstrate ā€œstrongerā€ energies near the ā€œolderā€ regions of the interface. These ā€œolderā€ regions may or may not demonstrate sequence conservation as the pressure is on energy, not identity. As natural interfaces are generally more punctate than the ideal model, we expect that while both FLIP and FunC interfaces may demonstrate multiple contacts, only FLIP interfaces will maintain overall centrality (eā€“f).</p

    Summary of protein and protein interface counts in FLIPdb.

    No full text
    <p>* Proteins chains are common to multiple sub-categories though the interfaces are distinct.</p><p>ā€” Interfaces are constructed from existing FLIPs through coordinate transformations arising from the symmetry of the source X-ray crystal structure (XFunCs).</p><p>FLIPdb contains 160 interfaces in 94 structures involving 219 individual protein chains. These interfaces have been assigned to FLIP or FunC functional categories and 9 functional sub-categories based on a review of the literature (see Supplement <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0097115#pone.0097115.s002" target="_blank">Table S1</a>). Due to the reuse of some chains, the totals represented in the first two columns do not sum across sub-categories.</p

    Accuracy of clustering in Training and Test-18 sets.

    No full text
    <p>ā€ ) TP: FLIP found in Cluster 1TN: FUNC found in Cluster 2</p><p>FP: FUNC found in Cluster 1FN: FLIP found in Cluster 2</p><p>The accuracy and Matthews correlation coefficient (MCC, a measure of the quality of a binary classification) of the results of the clusterings shown in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0097115#pone-0097115-g004" target="_blank">Figure 4</a> are indicated. The overall accuracy is 76% and 78% for both training Test-18 sets, respectively. TPs are quite readily identified in both training and Test-18 sets (80% and 69% <i>sensitivity</i>, respectively). The majority of TPs are enzymes and immunoglobin heavy chain-light chain interactions. TNs are less well identified (70% and 56% <i>negative predictive values</i>, respectively). MCCs of 0.50 and 0.62 indicate that our simple two-category approach is generally appropriate.</p

    PCA and K-means clustering of Training and Test-18 sets.

    No full text
    <p>Principal component analysis followed by K-means clustering was performed on the residues in the 100 FLIP and 60 FunC interfaces in the FLIPdb. The same 7 features identified in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0097115#pone-0097115-g003" target="_blank">Figure 3</a> are used here and the number of clusters was set to kā€Š=ā€Š2. Green (ā€œcluster 1ā€) and red (ā€œcluster 2ā€) ovals represent 1 standard deviation for Euclidean distances around the cluster centroid marked by ā€œ<b>x</b>ā€. Interfaces are indicated with symbols representing their functional sub-category. Green and Blue symbols are FLIP structures, but blue symbols are specifically AbAg and Inhibitor sub-categories. Red symbols are FunCs. (a) and (b): training set. (c) and (d): Test-18 testing set. (a) 49 FLIP interfaces (mostly enzymes and immunoglobin Heavy-Light chains) and 1 FunC are identified in cluster 1 (98% <i>precision</i>). (b) After removal of these 50 interfaces, a second PCA analysis of the remaining 110 interfaces produces new clusters with 48 and 62 members, respectively. PCA 2 Cluster 1 is 64% FLIP and cluster 2 is 68% FunC. Overall accuracy across both (a)+(b) is 76%. (c) and (d) show the projection of the 7 feature values 18 unrelated PPIs in the Test-18 set through the principal components developed on the training set. Enzymes and immunoglobin Heavy-Light again dominate cluster 1 (100%) and overall accuracy in both clusterings is 78%. [<i>Figure generated with JMP </i><a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0097115#pone.0097115-Chakrabarti1" target="_blank">[<i>46</i>]</a><i> and Microsoft Excel, 2008</i>].</p
    corecore