25 research outputs found

    Identifying direct contacts between protein complex subunits from their conditional dependence in proteomics datasets

    No full text
    <div><p>Determining the three dimensional arrangement of proteins in a complex is highly beneficial for uncovering mechanistic function and interpreting genetic variation in coding genes comprising protein complexes. There are several methods for determining co-complex interactions between proteins, among them co-fractionation / mass spectrometry (CF-MS), but it remains difficult to identify directly contacting subunits within a multi-protein complex. Correlation analysis of CF-MS profiles shows promise in detecting protein complexes as a whole but is limited in its ability to infer direct physical contacts among proteins in sub-complexes. To identify direct protein-protein contacts within human protein complexes we learn a sparse conditional dependency graph from approximately 3,000 CF-MS experiments on human cell lines. We show substantial performance gains in estimating direct interactions compared to correlation analysis on a benchmark of large protein complexes with solved three-dimensional structures. We demonstrate the method’s value in determining the three dimensional arrangement of proteins by making predictions for complexes without known structure (the exocyst and tRNA multi-synthetase complex) and by establishing evidence for the structural position of a recently discovered component of the core human EKC/KEOPS complex, GON7/C14ORF142, providing a more complete 3D model of the complex. Direct contact prediction provides easily calculable additional structural information for large-scale protein complex mapping studies and should be broadly applicable across organisms as more CF-MS datasets become available.</p></div

    Prediction of direct contacts within human protein complexes of unknown structure.

    No full text
    <p>Direct contacts and correlation coefficients were calculated between 8 members of the human exocyst complex (<b>A</b>: direct contacts, <b>B</b>: correlation) and 10 members of the tRNA multi-synthetase complex (<b>C</b>: direct contacts, <b>D</b>: correlation). Contact predictions are visualized here by drawing each direct contact prediction or correlation prediction as an edge connecting the relevant subunits. Each predicted direct contact is associated with its prediction score, which indicates the stability support for that interaction. In both complexes, certain direct interactions are strongly supported, suggesting key contacts formed in the three-dimensional organization of these complexes, neither of which has yet been resolved. The comparison between direct contact predictions and correlation predictions indicates that the graphical model removes edges considered conditionally independent from the direct contact network providing high confidence predictions. For <b>A</b> and <b>B</b>, colors (blue and green) represent known sub-complexes from Heider <i>et al</i>. [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005625#pcbi.1005625.ref054" target="_blank">54</a>]. For <b>C</b> and <b>D</b>, red represents structural subunits and purple represents synthetases.</p

    Scoring protein interactions by their conditional dependence accurately recovers direct protein-protein contacts within multi-subunit complexes.

    No full text
    <p><b>A.</b> We compared the value of the pairwise Pearson correlation coefficients between protein elution profiles (red curve) versus the derived conditionally dependent interactions (<i>i</i>.<i>e</i>., direct contact predictions) (black curve) for their ability to recapitulate true protein contacts in 10 complexes with known 3D structures. High-scoring conditionally dependent interactions were strongly enriched for true contacts, unlike the most highly correlated protein elution profiles. Additionally, we plot precision recall curves for predictions made with alternative <i>λ</i> choices (gray curves) and observe improved performance over correlation alone suggesting performance is robust to the selection of this parameter. The random line (dashed) represents the theoretical baseline for all true positives (TP) divided by the total number of possible subunit pairs (TP:335 / Total:1583) <b>B.</b> Evaluation of conditionally dependent interactions on an additional 19 non-redundant complexes showing consistent performance on a leave out set. Random = (TP:261 / Total:1575). <b>C.</b> Evaluation on combined 29 complexes used in <b>A</b> and <b>B</b>. Direct contact probability thresholds and correlation coefficient thresholds are marked in black and red text, respectively. Random = (TP:596 / Total:3158). <b>D.</b> Distributions of area under the precision recall curve (PR AUC) for the individual 29 complexes showing large variance across complexes but showing direct contacts outperforming correlation and random. Precision = TP/(TP+FP); recall = TP/(TP+FN).</p

    Prediction of direct contacts between subunits of the 26S proteasome.

    No full text
    <p><b>A.</b> Matrix of true contacts (upper right, derived from PDB entry 4CR2 [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005625#pcbi.1005625.ref047" target="_blank">47</a>]) and correlation coefficients (lower left) for the 26S proteasome. Correlation identifies general sub-complex structure but fails to discriminate between direct and indirect interactions (6 out of top 10 predictions correct). <b>B.</b> Matrix of true contacts (upper right) and direct contact predictions (lower left) for the 26S proteasome. The direct contact method identifies many true contacts while strongly reducing the number of false positive predictions (9 out of top 10 predictions correct).</p

    Overview of direct contact prediction between protein complex subunits.

    No full text
    <p>Co-fractionation / mass spectrometry (CF-MS) aims to repeatedly separate mixtures of native protein complexes (True Network) by non-denaturing chromatography. Protein elution profiles are generated by mass spectrometry identification of proteins across all chromatography fractions collected. Correlation between proteins’ elution profiles (<b>left side</b>) performs well for identifying the subunit composition of complexes [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005625#pcbi.1005625.ref004" target="_blank">4</a>, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005625#pcbi.1005625.ref006" target="_blank">6</a>, <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005625#pcbi.1005625.ref007" target="_blank">7</a>], but suffers from indirect associations among proteins that inhibit its ability to identify directly contacting subunits within each complex. We predict direct contacts (<b>right side</b>) by effectively inverting the correlation matrix to discriminate between conditionally dependent and conditionally independent associations, which correspond to direct and indirect protein interactions respectively. Specifically, we incorporate pseudo-counts, scale and transform the correlation matrix, use a sparse graphical model learning framework to compute conditionally dependent partial correlations, followed by StARS stability analysis [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005625#pcbi.1005625.ref029" target="_blank">29</a>] to re-score the resulting conditional dependency matrix such that each entry corresponds to the frequency with which it is supported by subsample trials. We retain non-zero scores between subunits within each pre-defined human protein complex [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005625#pcbi.1005625.ref032" target="_blank">32</a>] as our prediction of direct contacts.</p

    Relationship between direct contact probability and correlation for four example complexes with known structure.

    No full text
    <p><b>A.</b> Direct contact predictions for the well-observed proteasome (pdbid: 4CR2) show good discrimination between true (blue circle) and false positives (red x) compared to that of correlation. <b>B</b>. Direct contact predictions for the moderately observed spliceosome complex (pdbid: 5MQF) shows good discrimination between true and false positives but with a limited number of total predictions. <b>C.</b> The direct contact method does not make high confident predictions for the mitochondrial ribosome (pdbid: 4CE4) due to its limited sampling while correlation makes many high ranking false positive predictions. <b>D.</b> Similar to <b>C</b>, the direct contact method does not make predictions for the mitochondrial super-complex (pdbid: 2YBB) due to its limited sampling while correlation makes several high confident true positive predictions.</p

    The direct contact prediction method makes high confident predictions for well-sampled complexes.

    No full text
    <p><b>A.</b> Distribution of protein subunit sampling for complexes with a max direct contact probability > = 0.5 (blue) and complexes with a max probability < 0.5 (green). Sampling is measured for a complex by averaging the number of fractions for which each protein in the complex is observed (i.e. nonzero fractions). Our method performs better for well-sampled complexes and is limited for poorly sampled ones. <b>B.</b> Distribution of the mean number of nonzero fractions for pairs of proteins predicted by the direct contact method. Pairs of proteins with high probabilities are well sampled compared to those with lower probabilities.</p

    Direct contact predictions have highly enriched overlap with HeLa lysate crosslinking interactions.

    No full text
    <p>We report the enriched overlap of direct contact predictions and crosslinking interactions (z-score = 36, red triangle) relative to a distribution of random pairs of proteins in the crosslinking dataset (blue). Since we restrict our direct contact predictions to co-complex interactions within hu.MAP complexes, we additionally compare to the enriched overlap of co-complex edges and crosslinking interactions (z-score ~24, red circle). This shows direct contact predictions have a highly enriched overlap with crosslinking interactions above expected by co-complex edges alone.</p

    Demonstration of the application of fused-L2 to intra-species network inference <i>B. subtilis</i>.

    No full text
    <p>In each example, λ<sub><i>R</i></sub> is optimized separately without fusion and 10-fold cross validation is used when fitting networks (although, in <b>A</b> the gold-standard was not used in fitting the network and did not vary across folds) <b>A.</b> We compared the performance of independently fitting our main <i>B. Subtilis</i> dataset with two methods for incorporating data from another strain of <i>B. subtilis</i>. We evaluated performance on a gold-standard of known interactions. Adaptive fusion outperforms both an independently fitting the first <i>B. subtilis</i> dataset and fitting both <i>B. subtilis</i> datasets then rank-combining the results, as in Marbach et al. <b>B</b> We demonstrate the application of a prior based on operon membership. We generated fusion constraints between pairs of interactions for which both the TF and gene belonged to the same operon respectively. We then held out half of the gold-standard and used it as a prior on individual interactions, as in [<a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1005157#pcbi.1005157.ref033" target="_blank">33</a>]. We fit the <i>B. subtilis</i> network with and without fusion, then evaluated on the remaining gold-standard. In this example, using fusion constraints to enforce a prior based on co-regulation of genes in the same operon improved network inference performance.</p
    corecore