796 research outputs found

    Protein complex detection with semi-supervised learning in protein interaction networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein-protein interactions (PPIs) play fundamental roles in nearly all biological processes. The systematic analysis of PPI networks can enable a great understanding of cellular organization, processes and function. In this paper, we investigate the problem of protein complex detection from noisy protein interaction data, i.e., finding the subsets of proteins that are closely coupled via protein interactions. However, protein complexes are likely to overlap and the interaction data are very noisy. It is a great challenge to effectively analyze the massive data for biologically meaningful protein complex detection.</p> <p>Results</p> <p>Many people try to solve the problem by using the traditional unsupervised graph clustering methods. Here, we stand from a different point of view, redefining the properties and features for protein complexes and designing a “semi-supervised” method to analyze the problem. In this paper, we utilize the neural network with the “semi-supervised” mechanism to detect the protein complexes. By retraining the neural network model recursively, we could find the optimized parameters for the model, in such a way we can successfully detect the protein complexes. The comparison results show that our algorithm could identify protein complexes that are missed by other methods. We also have shown that our method achieve better precision and recall rates for the identified protein complexes than other existing methods. In addition, the framework we proposed is easy to be extended in the future.</p> <p>Conclusions</p> <p>Using a weighted network to represent the protein interaction network is more appropriate than using a traditional unweighted network. In addition, integrating biological features and topological features to represent protein complexes is more meaningful than using dense subgraphs. Last, the “semi-supervised” learning model is a promising model to detect protein complexes with more biological and topological features available.</p

    A community merger of optimization algorithm to extract overlapping communities in networks

    Full text link
    © 2018 IEEE. A community in networks is a subset of vertices primarily connecting internal components, yet less connecting to the external vertices. The existing algorithms aim to extract communities of the topological features in networks. However, the edges of practical complex networks involving a weight that represents the tightness degree of connection and robustness, which leads a significant influence on the accuracy of community detection. In our study, we propose an overlapping community detection method based on the seed expansion strategy applying to both the unweighted and the weighted networks, called OCSE. First, it redefines the edge weight and the vertex weight depending on the influence of the network topology and the original edge weight, and then selects the seed vertices and updates the edges weight. Comparisons between OCSE approach and existing community detection methods on synthetic and real-world networks, the results of the experiment show that our proposed approach has the significantly better performance in terms of the accuracy

    Recent advances in clustering methods for protein interaction networks

    Get PDF
    The increasing availability of large-scale protein-protein interaction data has made it possible to understand the basic components and organization of cell machinery from the network level. The arising challenge is how to analyze such complex interacting data to reveal the principles of cellular organization, processes and functions. Many studies have shown that clustering protein interaction network is an effective approach for identifying protein complexes or functional modules, which has become a major research topic in systems biology. In this review, recent advances in clustering methods for protein interaction networks will be presented in detail. The predictions of protein functions and interactions based on modules will be covered. Finally, the performance of different clustering methods will be compared and the directions for future research will be discussed

    On combinatorial optimisation in analysis of protein-protein interaction and protein folding networks

    Get PDF
    Abstract: Protein-protein interaction networks and protein folding networks represent prominent research topics at the intersection of bioinformatics and network science. In this paper, we present a study of these networks from combinatorial optimisation point of view. Using a combination of classical heuristics and stochastic optimisation techniques, we were able to identify several interesting combinatorial properties of biological networks of the COSIN project. We obtained optimal or near-optimal solutions to maximum clique and chromatic number problems for these networks. We also explore patterns of both non-overlapping and overlapping cliques in these networks. Optimal or near-optimal solutions to partitioning of these networks into non-overlapping cliques and to maximum independent set problem were discovered. Maximal cliques are explored by enumerative techniques. Domination in these networks is briefly studied, too. Applications and extensions of our findings are discussed

    Node similarity within subgraphs of protein interaction networks

    Full text link
    We propose a biologically motivated quantity, twinness, to evaluate local similarity between nodes in a network. The twinness of a pair of nodes is the number of connected, labeled subgraphs of size n in which the two nodes possess identical neighbours. The graph animal algorithm is used to estimate twinness for each pair of nodes (for subgraph sizes n=4 to n=12) in four different protein interaction networks (PINs). These include an Escherichia coli PIN and three Saccharomyces cerevisiae PINs -- each obtained using state-of-the-art high throughput methods. In almost all cases, the average twinness of node pairs is vastly higher than expected from a null model obtained by switching links. For all n, we observe a difference in the ratio of type A twins (which are unlinked pairs) to type B twins (which are linked pairs) distinguishing the prokaryote E. coli from the eukaryote S. cerevisiae. Interaction similarity is expected due to gene duplication, and whole genome duplication paralogues in S. cerevisiae have been reported to co-cluster into the same complexes. Indeed, we find that these paralogous proteins are over-represented as twins compared to pairs chosen at random. These results indicate that twinness can detect ancestral relationships from currently available PIN data.Comment: 10 pages, 5 figures. Edited for typos, clarity, figures improved for readabilit

    Identifying Dynamic Protein Complexes Based on Gene Expression Profiles and PPI Networks

    Get PDF
    Identification of protein complexes fromprotein-protein interaction networks has become a key problem for understanding cellular life in postgenomic era. Many computational methods have been proposed for identifying protein complexes. Up to now, the existing computational methods are mostly applied on static PPI networks. However, proteins and their interactions are dynamic in reality. Identifying dynamic protein complexes is more meaningful and challenging. In this paper, a novel algorithm, named DPC, is proposed to identify dynamic protein complexes by integrating PPI data and gene expression profiles. According to Core-Attachment assumption, these proteins which are always active in the molecular cycle are regarded as core proteins. The protein-complex cores are identified from these always active proteins by detecting dense subgraphs. Final protein complexes are extended from the protein-complex cores by adding attachments based on a topological character of “closeness” and dynamic meaning. The protein complexes produced by our algorithm DPC contain two parts: static core expressed in all the molecular cycle and dynamic attachments short-lived.The proposed algorithm DPC was applied on the data of Saccharomyces cerevisiae and the experimental results show that DPC outperforms CMC, MCL, SPICi, HC-PIN, COACH, and Core-Attachment based on the validation of matching with known complexes and hF-measures

    Integrative Analysis of Many Weighted Co-Expression Networks Using Tensor Computation

    Get PDF
    The rapid accumulation of biological networks poses new challenges and calls for powerful integrative analysis tools. Most existing methods capable of simultaneously analyzing a large number of networks were primarily designed for unweighted networks, and cannot easily be extended to weighted networks. However, it is known that transforming weighted into unweighted networks by dichotomizing the edges of weighted networks with a threshold generally leads to information loss. We have developed a novel, tensor-based computational framework for mining recurrent heavy subgraphs in a large set of massive weighted networks. Specifically, we formulate the recurrent heavy subgraph identification problem as a heavy 3D subtensor discovery problem with sparse constraints. We describe an effective approach to solving this problem by designing a multi-stage, convex relaxation protocol, and a non-uniform edge sampling technique. We applied our method to 130 co-expression networks, and identified 11,394 recurrent heavy subgraphs, grouped into 2,810 families. We demonstrated that the identified subgraphs represent meaningful biological modules by validating against a large set of compiled biological knowledge bases. We also showed that the likelihood for a heavy subgraph to be meaningful increases significantly with its recurrence in multiple networks, highlighting the importance of the integrative approach to biological network analysis. Moreover, our approach based on weighted graphs detects many patterns that would be overlooked using unweighted graphs. In addition, we identified a large number of modules that occur predominately under specific phenotypes. This analysis resulted in a genome-wide mapping of gene network modules onto the phenome. Finally, by comparing module activities across many datasets, we discovered high-order dynamic cooperativeness in protein complex networks and transcriptional regulatory networks
    corecore