529 research outputs found

    Minimax Structured Normal Means Inference

    Full text link
    We provide a unified treatment of a broad class of noisy structure recovery problems, known as structured normal means problems. In this setting, the goal is to identify, from a finite collection of Gaussian distributions with different means, the distribution that produced some observed data. Recent work has studied several special cases including sparse vectors, biclusters, and graph-based structures. We establish nearly matching upper and lower bounds on the minimax probability of error for any structured normal means problem, and we derive an optimality certificate for the maximum likelihood estimator, which can be applied to many instantiations. We also consider an experimental design setting, where we generalize our minimax bounds and derive an algorithm for computing a design strategy with a certain optimality property. We show that our results give tight minimax bounds for many structure recovery problems and consider some consequences for interactive sampling

    Some results on more flexible versions of Graph Motif

    Full text link
    The problems studied in this paper originate from Graph Motif, a problem introduced in 2006 in the context of biological networks. Informally speaking, it consists in deciding if a multiset of colors occurs in a connected subgraph of a vertex-colored graph. Due to the high rate of noise in the biological data, more flexible definitions of the problem have been outlined. We present in this paper two inapproximability results for two different optimization variants of Graph Motif: one where the size of the solution is maximized, the other when the number of substitutions of colors to obtain the motif from the solution is minimized. We also study a decision version of Graph Motif where the connectivity constraint is replaced by the well known notion of graph modularity. While the problem remains NP-complete, it allows algorithms in FPT for biologically relevant parameterizations

    On the limiting behavior of parameter-dependent network centrality measures

    Get PDF
    We consider a broad class of walk-based, parameterized node centrality measures for network analysis. These measures are expressed in terms of functions of the adjacency matrix and generalize various well-known centrality indices, including Katz and subgraph centrality. We show that the parameter can be "tuned" to interpolate between degree and eigenvector centrality, which appear as limiting cases. Our analysis helps explain certain correlations often observed between the rankings obtained using different centrality measures, and provides some guidance for the tuning of parameters. We also highlight the roles played by the spectral gap of the adjacency matrix and by the number of triangles in the network. Our analysis covers both undirected and directed networks, including weighted ones. A brief discussion of PageRank is also given.Comment: First 22 pages are the paper, pages 22-38 are the supplementary material

    Which clustering algorithm is better for predicting protein complexes?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein-Protein interactions (PPI) play a key role in determining the outcome of most cellular processes. The correct identification and characterization of protein interactions and the networks, which they comprise, is critical for understanding the molecular mechanisms within the cell. Large-scale techniques such as pull down assays and tandem affinity purification are used in order to detect protein interactions in an organism. Today, relatively new high-throughput methods like yeast two hybrid, mass spectrometry, microarrays, and phage display are also used to reveal protein interaction networks.</p> <p>Results</p> <p>In this paper we evaluated four different clustering algorithms using six different interaction datasets. We parameterized the MCL, Spectral, RNSC and Affinity Propagation algorithms and applied them to six PPI datasets produced experimentally by Yeast 2 Hybrid (Y2H) and Tandem Affinity Purification (TAP) methods. The predicted clusters, so called protein complexes, were then compared and benchmarked with already known complexes stored in published databases.</p> <p>Conclusions</p> <p>While results may differ upon parameterization, the MCL and RNSC algorithms seem to be more promising and more accurate at predicting PPI complexes. Moreover, they predict more complexes than other reviewed algorithms in absolute numbers. On the other hand the spectral clustering algorithm achieves the highest valid prediction rate in our experiments. However, it is nearly always outperformed by both RNSC and MCL in terms of the geometrical accuracy while it generates the fewest valid clusters than any other reviewed algorithm. This article demonstrates various metrics to evaluate the accuracy of such predictions as they are presented in the text below. Supplementary material can be found at: <url>http://www.bioacademy.gr/bioinformatics/projects/ppireview.htm</url></p

    Searching for network modules

    Full text link
    When analyzing complex networks a key target is to uncover their modular structure, which means searching for a family of modules, namely node subsets spanning each a subnetwork more densely connected than the average. This work proposes a novel type of objective function for graph clustering, in the form of a multilinear polynomial whose coefficients are determined by network topology. It may be thought of as a potential function, to be maximized, taking its values on fuzzy clusterings or families of fuzzy subsets of nodes over which every node distributes a unit membership. When suitably parametrized, this potential is shown to attain its maximum when every node concentrates its all unit membership on some module. The output thus is a partition, while the original discrete optimization problem is turned into a continuous version allowing to conceive alternative search strategies. The instance of the problem being a pseudo-Boolean function assigning real-valued cluster scores to node subsets, modularity maximization is employed to exemplify a so-called quadratic form, in that the scores of singletons and pairs also fully determine the scores of larger clusters, while the resulting multilinear polynomial potential function has degree 2. After considering further quadratic instances, different from modularity and obtained by interpreting network topology in alternative manners, a greedy local-search strategy for the continuous framework is analytically compared with an existing greedy agglomerative procedure for the discrete case. Overlapping is finally discussed in terms of multiple runs, i.e. several local searches with different initializations.Comment: 10 page

    SLIDER: Mining correlated motifs in protein-protein interaction networks

    Get PDF
    Abstract—Correlated motif mining (CMM) is the problem to find overrepresented pairs of patterns, called motif pairs, in interacting protein sequences. Algorithmic solutions for CMM thereby provide a computational method for predicting binding sites for protein interaction. In this paper, we adopt a motif-driven approach where the support of candidate motif pairs is evaluated in the network. We experimentally establish the superiority of the Chi-square-based support measure over other support measures. Furthermore, we obtain that CMM is an NP-hard problem for a large class of support measures (including Chi-square) and reformulate the search for correlated motifs as a combinatorial optimization problem. We then present the method SLIDER which uses local search with a neighborhood function based on sliding motifs and employs the Chi-square-based support measure. We show that SLIDER outperforms existing motif-driven CMM methods and scales to large protein-protein interaction networks

    Comparative genomic analysis of novel Acinetobacter symbionts : A combined systems biology and genomics approach

    Get PDF
    Acknowledgements This work was supported by University of Delhi, Department of Science and Technology- Promotion of University Research and Scientific Excellence (DST-PURSE). V.G., S.H. and U.S. gratefully acknowledge the Council for Scientific and Industrial Research (CSIR), University Grant Commission (UGC) and Department of Biotechnology (DBT) for providing research fellowship.Peer reviewedPublisher PD
    • …
    corecore