3 research outputs found

    An ensemble framework for identifying essential proteins

    No full text
    Many centrality measures have been proposed to mine and characterize the correlations between network topological properties and protein essentiality. However, most of them show limited prediction accuracy,and the number of common predicted essential proteins by different methods is very small.Results:In this paper, an ensemble framework is proposed which integrates gene expression data and protein-protein interaction networks (PINs). It aims to improve the prediction accuracy of basic centrality measures. The idea behind this ensemble framework is that different protein-protein interactions (PPIs) may show different contributions to protein essentiality. Five standard centrality measures (degree centrality, betweenness centrality, closeness centrality,eigenvector centrality, and subgraph centrality) are integrated into the ensemble framework respectively. We evaluated the performance of the proposed ensemble framework using yeast PINs and gene expression data. The results show that it can considerably improve the prediction accuracy of the five centrality measures individually. It can also remarkably increase the number of common predicted essential proteins among those predicted by each centrality measure individually and enable each centrality measure to find more low-degree essential proteins.Conclusions:This paper demonstrates that it is valuable to differentiate the contributions of different PPIs for identifying essential proteins based on network topological characteristics. The proposed ensemble framework is a successful paradigm to this end

    Additional file 1: of An ensemble framework for identifying essential proteins

    No full text
    Figure S1. The distributions of node strength for essential and nonessential protein. Figure S2. The distributions of co-expression weights for IBEPs and Non-IBEPs. Figure S3. Performance comparison of five centrality measures (BC, CC, DC, EC, and SC) on two yeast PINs (PIN24K and PIN76K) using uniform thresholding strategy ((a)-(h)). Figure S4. Relationship between the number of nonzero-degree nodes (or proteins) in PINs and the thresholds for generating the corresponding PINs using absolute thresholding strategy ((a)-(b)). Figure S5. Relationship between the number of nonzero-degree nodes (or proteins) in PINs and the thresholds for generating the PINs using uniform thresholding strategy ((a)-(b)). Figure S6. Performance comparison of five centrality measures (BC, CC, DC, EC, and SC) with their corresponding ensemble methods (absolute thresholding strategy) with different sample sizes or weights on two yeast PINs. Figure S7. Comparison of the number of essential proteins detected by each ensemble method using uniform thresholding strategy with different voting weights on two yeast PINs. Table S1. The number of common predicted proteins (overlap) among the top 100 proteins ranked by PCC-weighted methods. Table S2. The number of common predicted proteins (overlap) among the top 100 proteins ranked by single PCC-threshold methods (thr = 0.75). Table S3. Correlation between centrality measures based on their top 100 ranked proteins. Table S4. Correlation between ensemble methods based on their top 100 ranked proteins. (PDF 1294 kb

    Additional file 2: of An ensemble framework for identifying essential proteins

    No full text
    Table S5-S6. The information of the top 100 proteins ranked by five centrality measures and by five ensemble methods on PIN24K. Table S7-S8. The information of the top 100 proteins ranked by five centrality measures and by five ensemble methods on PIN76K. (XLSX 64 kb
    corecore