1,030 research outputs found

    Detecting highly overlapping community structure by greedy clique expansion

    Get PDF
    In complex networks it is common for each node to belong to several communities, implying a highly overlapping community structure. Recent advances in benchmarking indicate that existing community assignment algorithms that are capable of detecting overlapping communities perform well only when the extent of community overlap is kept to modest levels. To overcome this limitation, we introduce a new community assignment algorithm called Greedy Clique Expansion (GCE). The algorithm identifies distinct cliques as seeds and expands these seeds by greedily optimizing a local fitness function. We perform extensive benchmarks on synthetic data to demonstrate that GCE's good performance is robust across diverse graph topologies. Significantly, GCE is the only algorithm to perform well on these synthetic graphs, in which every node belongs to multiple communities. Furthermore, when put to the task of identifying functional modules in protein interaction data, and college dorm assignments in Facebook friendship data, we find that GCE performs competitively.Comment: 10 pages, 7 Figures. Implementation source and binaries available at http://sites.google.com/site/greedycliqueexpansion

    An Algorithm for Finding Functional Modules and Protein Complexes in Protein-Protein Interaction Networks

    Get PDF
    Biological processes are often performed by a group of proteins rather than by individual proteins, and proteins in a same biological group form a densely connected subgraph in a protein-protein interaction network. Therefore, finding a densely connected subgraph provides useful information to predict the function or protein complex of uncharacterized proteins in the highly connected subgraph. We have developed an efficient algorithm and program for finding cliques and near-cliques in a protein-protein interaction network. Analysis of the interaction network of yeast proteins using the algorithm demonstrates that 59% of the near-cliques identified by our algorithm have at least one function shared by all the proteins within a near-clique, and that 56% of the near-cliques show a good agreement with the experimentally determined protein complexes catalogued in MIPS

    Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage

    Full text link
    We propose a fast, parallel maximum clique algorithm for large sparse graphs that is designed to exploit characteristics of social and information networks. The method exhibits a roughly linear runtime scaling over real-world networks ranging from 1000 to 100 million nodes. In a test on a social network with 1.8 billion edges, the algorithm finds the largest clique in about 20 minutes. Our method employs a branch and bound strategy with novel and aggressive pruning techniques. For instance, we use the core number of a vertex in combination with a good heuristic clique finder to efficiently remove the vast majority of the search space. In addition, we parallelize the exploration of the search tree. During the search, processes immediately communicate changes to upper and lower bounds on the size of maximum clique, which occasionally results in a super-linear speedup because vertices with large search spaces can be pruned by other processes. We apply the algorithm to two problems: to compute temporal strong components and to compress graphs.Comment: 11 page

    High functional coherence in k-partite protein cliques of protein interaction networks

    Full text link
    We introduce a new topological concept called k-partite protein cliques to study protein interaction (PPI) networks. In particular, we examine functional coherence of proteins in k-partite protein cliques. A k-partite protein clique is a k-partite maximal clique comprising two or more nonoverlapping protein subsets between any two of which full interactions are exhibited. In the detection of PPI&rsquo;s k-partite maximal cliques, we propose to transform PPI networks into induced K-partite graphs with proteins as vertices where edges only exist among the graph&rsquo;s partites. Then, we present a k-partite maximal clique mining (MaCMik) algorithm to enumerate k-partite maximal cliques from K-partite graphs. Our MaCMik algorithm is applied to a yeast PPI network. We observe that there does exist interesting and unusually high functional coherence in k-partite protein cliques&mdash;most proteins in k-partite protein cliques, especially those in the same partites, share the same functions. Therefore, the idea of k-partite protein cliques suggests a novel approach to characterizing PPI networks, and may help function prediction for unknown proteins.<br /

    Recent advances in clustering methods for protein interaction networks

    Get PDF
    The increasing availability of large-scale protein-protein interaction data has made it possible to understand the basic components and organization of cell machinery from the network level. The arising challenge is how to analyze such complex interacting data to reveal the principles of cellular organization, processes and functions. Many studies have shown that clustering protein interaction network is an effective approach for identifying protein complexes or functional modules, which has become a major research topic in systems biology. In this review, recent advances in clustering methods for protein interaction networks will be presented in detail. The predictions of protein functions and interactions based on modules will be covered. Finally, the performance of different clustering methods will be compared and the directions for future research will be discussed

    A Coevolutionary Residue Network at the Site of a Functionally Important Conformational Change in a Phosphohexomutase Enzyme Family

    Get PDF
    Coevolution analyses identify residues that co-vary with each other during evolution, revealing sequence relationships unobservable from traditional multiple sequence alignments. Here we describe a coevolutionary analysis of phosphomannomutase/phosphoglucomutase (PMM/PGM), a widespread and diverse enzyme family involved in carbohydrate biosynthesis. Mutual information and graph theory were utilized to identify a network of highly connected residues with high significance. An examination of the most tightly connected regions of the coevolutionary network reveals that most of the involved residues are localized near an interdomain interface of this enzyme, known to be the site of a functionally important conformational change. The roles of four interface residues found in this network were examined via site-directed mutagenesis and kinetic characterization. For three of these residues, mutation to alanine reduces enzyme specificity to ∼10% or less of wild-type, while the other has ∼45% activity of wild-type enzyme. An additional mutant of an interface residue that is not densely connected in the coevolutionary network was also characterized, and shows no change in activity relative to wild-type enzyme. The results of these studies are interpreted in the context of structural and functional data on PMM/PGM. Together, they demonstrate that a network of coevolving residues links the highly conserved active site with the interdomain conformational change necessary for the multi-step catalytic reaction. This work adds to our understanding of the functional roles of coevolving residue networks, and has implications for the definition of catalytically important residues

    Method and System for Identification of Metabolites Using Mass Spectra

    Get PDF
    A method and system is provided for mass spectrometry for identification of a specific elemental formula for an unknown compound which includes but is not limited to a metabolite. The method includes calculating a natural abundance probability (NAP) of a given isotopologue for isotopes of non-labelling elements of an unknown compound. Molecular fragments for a subset of isotopes identified using the NAP are created and sorted into a requisite cache data structure to be subsequently searched. Peaks from raw spectrum data from mass spectrometry for an unknown compound. Sample-specific peaks of the unknown com- pound from various spectral artifacts in ultra-high resolution Fourier transform mass spectra are separated. A set of possible isotope-resolved molecular formula (IMF) are created by iteratively searching the molecular fragment caches and combining with additional isotopes and then statistically filtering the results based on NAP and mass-to-charge (m/2) matching probabilities. An unknown compound is identified and its corresponding elemental molecular formula (EMF) from statistically-significant caches of isotopologues with compatible IMFs
    corecore