55 research outputs found

    Soft Image Segmentation: On the Clustering of Irregular, Weighted, Multivariate Marked Networks

    Get PDF
    The contribution exposes and illustrates a general, flexible formalism, together with an associated iterative procedure, aimed at determining soft memberships of marked nodes in a weighted network. Gathering together spatial entities which are both spatially close and similar regarding their features is an issue relevant in image segmentation, spatial clustering, and data analysis in general. Unoriented weighted networks are specified by an ``exchange matrix", determining the probability to select a pair of neighbors. We present a family of membership-dependent free energies, whose local minimization specifies soft clusterings. The free energy additively combines a mutual information, as well as various energy terms, concave or convex in the memberships: within-group inertia, generalized cuts (extending weighted Ncut and modularity), and membership discontinuities (generalizing Dirichlet forms). The framework is closely related to discrete Markov models, random walks, label propagation and spatial autocorrelation (Moran's I), and can express the Mumford-Shah approach. Four small datasets illustrate the theory

    Finding and testing network communities by lumped Markov chains

    Get PDF
    Identifying communities (or clusters), namely groups of nodes with comparatively strong internal connectivity, is a fundamental task for deeply understanding the structure and function of a network. Yet, there is a lack of formal criteria for defining communities and for testing their significance. We propose a sharp definition which is based on a significance threshold. By means of a lumped Markov chain model of a random walker, a quality measure called "persistence probability" is associated to a cluster. Then the cluster is defined as an "α\alpha-community" if such a probability is not smaller than α\alpha. Consistently, a partition composed of α\alpha-communities is an "α\alpha-partition". These definitions turn out to be very effective for finding and testing communities. If a set of candidate partitions is available, setting the desired α\alpha-level allows one to immediately select the α\alpha-partition with the finest decomposition. Simultaneously, the persistence probabilities quantify the significance of each single community. Given its ability in individually assessing the quality of each cluster, this approach can also disclose single well-defined communities even in networks which overall do not possess a definite clusterized structure

    Candidate gene prioritization by network analysis of differential expression using machine learning approaches

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals.</p> <p>To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network.</p> <p>Results</p> <p>We have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (<it>Simple Expression Ranking</it>). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the <it>Heat Kernel Diffusion Ranking </it>leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%.</p> <p>Conclusion</p> <p>In this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype.</p

    Scuba:Scalable kernel-based gene prioritization

    Get PDF
    Abstract Background The uncovering of genes linked to human diseases is a pressing challenge in molecular biology and precision medicine. This task is often hindered by the large number of candidate genes and by the heterogeneity of the available information. Computational methods for the prioritization of candidate genes can help to cope with these problems. In particular, kernel-based methods are a powerful resource for the integration of heterogeneous biological knowledge, however, their practical implementation is often precluded by their limited scalability. Results We propose Scuba, a scalable kernel-based method for gene prioritization. It implements a novel multiple kernel learning approach, based on a semi-supervised perspective and on the optimization of the margin distribution. Scuba is optimized to cope with strongly unbalanced settings where known disease genes are few and large scale predictions are required. Importantly, it is able to efficiently deal both with a large amount of candidate genes and with an arbitrary number of data sources. As a direct consequence of scalability, Scuba integrates also a new efficient strategy to select optimal kernel parameters for each data source. We performed cross-validation experiments and simulated a realistic usage setting, showing that Scuba outperforms a wide range of state-of-the-art methods. Conclusions Scuba achieves state-of-the-art performance and has enhanced scalability compared to existing kernel-based approaches for genomic data. This method can be useful to prioritize candidate genes, particularly when their number is large or when input data is highly heterogeneous. The code is freely available at https://github.com/gzampieri/Scuba

    Network Analysis of Differential Expression for the Identification of Disease-Causing Genes

    Get PDF
    Genetic studies (in particular linkage and association studies) identify chromosomal regions involved in a disease or phenotype of interest, but those regions often contain many candidate genes, only a few of which can be followed-up for biological validation. Recently, computational methods to identify (prioritize) the most promising candidates within a region have been proposed, but they are usually not applicable to cases where little is known about the phenotype (no or few confirmed disease genes, fragmentary understanding of the biological cascades involved). We seek to overcome this limitation by replacing knowledge about the biological process by experimental data on differential gene expression between affected and healthy individuals. Considering the problem from the perspective of a gene/protein network, we assess a candidate gene by considering the level of differential expression in its neighborhood under the assumption that strong candidates will tend to be surrounded by differentially expressed neighbors. We define a notion of soft neighborhood where each gene is given a contributing weight, which decreases with the distance from the candidate gene on the protein network. To account for multiple paths between genes, we define the distance using the Laplacian exponential diffusion kernel. We score candidates by aggregating the differential expression of neighbors weighted as a function of distance. Through a randomization procedure, we rank candidates by p-values. We illustrate our approach on four monogenic diseases and successfully prioritize the known disease causing genes

    Predicting Missing Links via Local Information

    Get PDF
    Missing link prediction of networks is of both theoretical interest and practical significance in modern science. In this paper, we empirically investigate a simple framework of link prediction on the basis of node similarity. We compare nine well-known local similarity measures on six real networks. The results indicate that the simplest measure, namely common neighbors, has the best overall performance, and the Adamic-Adar index performs the second best. A new similarity measure, motivated by the resource allocation process taking place on networks, is proposed and shown to have higher prediction accuracy than common neighbors. It is found that many links are assigned same scores if only the information of the nearest neighbors is used. We therefore design another new measure exploited information of the next nearest neighbors, which can remarkably enhance the prediction accuracy.Comment: For International Workshop: "The Physics Approach To Risk: Agent-Based Models and Networks", http://intern.sg.ethz.ch/cost-p10

    Topology analysis and visualization of Potyvirus protein-protein interaction network

    Get PDF
    Background: One of the central interests of Virology is the identification of host factors that contribute to virus infection. Despite tremendous efforts, the list of factors identified remains limited. With omics techniques, the focus has changed from identifying and thoroughly characterizing individual host factors to the simultaneous analysis of thousands of interactions, framing them on the context of protein-protein interaction networks and of transcriptional regulatory networks. This new perspective is allowing the identification of direct and indirect viral targets. Such information is available for several members of the Potyviridae family, one of the largest and more important families of plant viruses. Results: After collecting information on virus protein-protein interactions from different potyviruses, we have processed it and used it for inferring a protein-protein interaction network. All proteins are connected into a single network component. Some proteins show a high degree and are highly connected while others are much less connected, with the network showing a significant degree of dissortativeness. We have attempted to integrate this virus protein-protein interaction network into the largest protein-protein interaction network of Arabidopsis thaliana, a susceptible laboratory host. To make the interpretation of data and results easier, we have developed a new approach for visualizing and analyzing the dynamic spread on the host network of the local perturbations induced by viral proteins. We found that local perturbations can reach the entire host protein-protein interaction network, although the efficiency of this spread depends on the particular viral proteins. By comparing the spread dynamics among viral proteins, we found that some proteins spread their effects fast and efficiently by attacking hubs in the host network while other proteins exert more local effects. Conclusions: Our findings confirm that potyvirus protein-protein interaction networks are highly connected, with some proteins playing the role of hubs. Several topological parameters depend linearly on the protein degree. Some viral proteins focus their effect in only host hubs while others diversify its effect among several proteins at the first step. Future new data will help to refine our model and to improve our predictions.This work was supported by the Spanish Ministerio de Economia y Competitividad grants BFU2012-30805 (to SFE), DPI2011-28112-C04-02 (to AF) and DPI2011-28112-C04-01 (to JP). The first two authors are recipients of fellowships from the Spanish Ministerio de Economia y Competitividad: BES-2012-053772 (to GB) and BES-2012-057812 (to AF-F).Bosque, G.; Folch Fortuny, A.; Picó Marco, JA.; Ferrer, A.; Elena Fito, SF. (2014). Topology analysis and visualization of Potyvirus protein-protein interaction network. BMC Systems Biology. 129(8):1-15. doi:10.1186/s12918-014-0129-8S1151298Gibbs A, Ohshima K: Potyviruses and the digital revolution. Annu Rev Phytopathol. 2010, 48: 205-223. 10.1146/annurev-phyto-073009-114404.Spence NJ, Phiri NA, Hughes SL, Mwaniki A, Simons S, Oduor G, Chacha D, Kuria A, Ndirangu S, Kibata GN, Marris GC: Economic impact of turnip mosaic virus, cauliflower mosaic virus and beet mosaic virus in three Kenyan vegetables. Plant Pathol. 2007, 56: 317-323. 10.1111/j.1365-3059.2006.01498.x.Ward CW, Shukla DD: Taxonomy of potyviruses: current problems and some solutions. Intervirology. 1991, 32: 269-296.Riechmann JL, Laín S, García JA: Highlights and prospects of potyvirus molecular biology. J Gen Virol. 1992, 73 (Pt 1): 1-16. 10.1099/0022-1317-73-1-1.Elena SF, Rodrigo G: Towards an integrated molecular model of plant-virus interactions. Curr Opin Virol. 2012, 2: 719-724. 10.1016/j.coviro.2012.09.004.Wei T, Zhang C, Hong J, Xiong R, Kasschau KD, Zhou X, Carrington JC, Wang A: Formation of complexes at plasmodesmata for potyvirus intercellular movement is mediated by the viral protein P3N-PIPO. PLoS Pathog. 2010, 6: e1000962-10.1371/journal.ppat.1000962.Chung BY-W, Miller WA, Atkins JF, Firth AE: An overlapping essential gene in the Potyviridae. Proc Natl Acad Sci. 2008, 105: 5897-5902. 10.1073/pnas.0800468105.Allison R, Johnston RE, Dougherty WG: The nucleotide sequence of the coding region of tobacco etch virus genomic RNA: evidence for the synthesis of a single polyprotein. Virology. 1986, 154: 9-20. 10.1016/0042-6822(86)90425-3.Domier LL, Franklin KM, Shahabuddin M, Hellmann GM, Overmeyer JH, Hiremath ST, Siaw MF, Lomonossoff GP, Shaw JG, Rhoads RE: The nucleotide sequence of tobacco vein mottling virus RNA. Nucleic Acids Res. 1986, 14: 5417-5430. 10.1093/nar/14.13.5417.Revers F, Le Gall O, Candresse T, Maule AJ: New advances in understanding the molecular biology of plant/potyvirus interactions. Mol Plant Microbe Interact. 1999, 12: 367-376. 10.1094/MPMI.1999.12.5.367.Urcuqui-Inchima S, Haenni AL, Bernardi F: Potyvirus proteins: a wealth of functions. Virus Res. 2001, 74: 157-175. 10.1016/S0168-1702(01)00220-9.Merits A, Rajamäki M-L, Lindholm P, Runeberg-Roos P, Kekarainen T, Puustinen P, Mäkeläinen K, Valkonen JPT, Saarma M: Proteolytic processing of potyviral proteins and polyprotein processing intermediates in insect and plant cells. J Gen Virol. 2002, 83: 1211-1221.Adams MJ, Antoniw JF, Beaudoin F: Overview and analysis of the polyprotein cleavage sites in the family Potyviridae. Mol Plant Pathol. 2005, 6: 471-487. 10.1111/j.1364-3703.2005.00296.x.Zheng H, Yan F, Lu Y, Sun L, Lin L, Cai L, Hou M, Chen J: Mapping the self-interacting domains of TuMV HC-Pro and the subcellular localization of the protein. Virus Genes. 2011, 42: 110-116. 10.1007/s11262-010-0538-8.Culver JN, Padmanabhan MS: Virus-induced disease: altering host physiology one interaction at a time. Annu Rev Phytopathol. 2007, 45: 221-243. 10.1146/annurev.phyto.45.062806.094422.De Las Rivas J, Fontanillo C: Protein-protein interactions essentials: key concepts to building and analyzing interactome networks. PLoS Comput Biol. 2010, 6: e1000807-10.1371/journal.pcbi.1000807.Bornke F: Protein Interaction Networks. Anal Biol Netw. Edited by: Junker BH, Schreiber F. 2008, John Wiley & Sons, Inc, Hoboken, NJ, USA, 207-232. 10.1002/9780470253489.ch9.Phizicky EM, Fields S: Protein-protein interactions: methods for detection and analysis. Microbiol Rev. 1995, 59: 94-123.Brückner A, Polge C, Lentze N, Auerbach D, Schlattner U: Yeast two-hybrid, a powerful tool for systems biology. Int J Mol Sci. 2009, 10: 2763-2788. 10.3390/ijms10062763.Fields S, Song O: A novel genetic system to detect protein-protein interactions. Nature. 1989, 340: 245-246. 10.1038/340245a0.Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams S-L, Millar A, Taylor P, Bennett K, Boutilier K, Yang L, Wolting C, Donaldson I, Schandorff S, Shewnarane J, Vo M, Taggart J, Goudreault M, Muskat B, Alfarano C, Dewar D, Lin Z, Michalickova K, Willems AR, Sassi H, Nielsen PA, Rasmussen KJ, Andersen JR, Johansen LE, Hansen LH, et al: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002, 415: 180-183. 10.1038/415180a.Hu C-D, Chinenov Y, Kerppola TK: Visualization of interactions among bZIP and Rel family proteins in living cells using bimolecular fluorescence complementation. Mol Cell. 2002, 9: 789-798. 10.1016/S1097-2765(02)00496-3.Kodama Y, Hu C-D: An improved bimolecular fluorescence complementation assay with a high signal-to-noise ratio. Biotechniques. 2010, 49: 793-805. 10.2144/000113519.Rual J-F, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, Li N, Berriz GF, Gibbons FD, Dreze M, Ayivi-Guedehoussou N, Klitgord N, Simon C, Boxem M, Milstein S, Rosenberg J, Goldberg DS, Zhang LV, Wong SL, Franklin G, Li S, Albala JS, Lim J, Fraughton C, Llamosas E, Cevik S, Bex C, Lamesch P, Sikorski RS, Vandenhaute J, Zoghbi HY, et al: Towards a proteome-scale map of the human protein-protein interaction network. Nature. 2005, 437: 1173-1178. 10.1038/nature04209.Venkatesan K, Rual J-F, Vazquez A, Stelzl U, Lemmens I, Hirozane-Kishikawa T, Hao T, Zenkner M, Xin X, Goh K-I, Yildirim MA, Simonis N, Heinzmann K, Gebreab F, Sahalie JM, Cevik S, Simon C, de Smet A-S, Dann E, Smolyar A, Vinayagam A, Yu H, Szeto D, Borick H, Dricot A, Klitgord N, Murray RR, Lin C, Lalowski M, Timm J, et al: An empirical framework for binary interactome mapping. Nat Methods. 2008, 6: 83-90. 10.1038/nmeth.1280.Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, Qureshi-Emili A, Li Y, Godwin B, Conover D, Kalbfleisch T, Vijayadamodar G, Yang M, Johnston M, Fields S, Rothberg JM: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000, 403: 623-627. 10.1038/35001009.Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci. 2001, 98: 4569-4574. 10.1073/pnas.061034498.Uetz P, Dong Y-A, Zeretzke C, Atzler C, Baiker A, Berger B, Rajagopala SV, Roupelieva M, Rose D, Fossum E, Haas J: Herpesviral protein networks and their interaction with the human proteome. Science. 2006, 311: 239-242. 10.1126/science.1116804.Fossum E, Friedel CC, Rajagopala SV, Titz B, Baiker A, Schmidt T, Kraus T, Stellberger T, Rutenberg C, Suthram S, Bandyopadhyay S, Rose D, von Brunn A, Uhlmann M, Zeretzke C, Dong Y-A, Boulet H, Koegl M, Bailer SM, Koszinowski U, Ideker T, Uetz P, Zimmer R, Haas J: Evolutionarily conserved herpesviral protein interaction networks. PLoS Pathog. 2009, 5: e1000570-10.1371/journal.ppat.1000570.Rodrigo G, Carrera J, Ruiz-Ferrer V, del Toro FJ, Llave C, Voinnet O, Elena SF: A meta-analysis reveals the commonalities and differences in Arabidopsis thaliana response to different viral pathogens. PLoS One. 2012, 7: e40526-10.1371/journal.pone.0040526.Newman MEJ: The structure and function of complex networks. SIAM Rev. 2003, 45: 167-256. 10.1137/S003614450342480.Watts DJ, Strogatz SH: Collective dynamics of "small-world" networks. Nature. 1998, 393: 440-442. 10.1038/30918.Albert R, Barabási A-L: Statistical mechanics of complex networks. Rev Mod Phys. 2002, 74: 47-97. 10.1103/RevModPhys.74.47.Boccaletti S, Latora V, Moreno Y, Chávez M, Hwang D: Complex networks: structure and dynamics. Phys Rep. 2006, 424: 175-308. 10.1016/j.physrep.2005.10.009.Barabási A-L, Oltvai ZN: Network biology: understanding the cell's functional organization. Nat Rev Genet. 2004, 5: 101-113. 10.1038/nrg1272.Albert R, DasGupta B, Hegde R, Sivanathan GS, Gitter A, Gürsoy G, Paul P, Sontag E: Computationally efficient measure of topological redundancy of biological and social networks. Phys Rev E. 2011, 84: 036117-10.1103/PhysRevE.84.036117.Cho D-Y, Kim Y-A, Przytycka TM: Chapter 5: network biology approach to complex diseases. PLoS Comput Biol. 2012, 8: e1002820-10.1371/journal.pcbi.1002820.Russell RB, Aloy P: Targeting and tinkering with interaction networks. Nat Chem Biol. 2008, 4: 666-673. 10.1038/nchembio.119.Winterbach W, Mieghem PV, Reinders M, Wang H, de Ridder D: Topology of molecular interaction networks. BMC Syst Biol. 2013, 7: 90-10.1186/1752-0509-7-90.Pržulj N: Protein-protein interactions: making sense of networks via graph-theoretic modeling. Bioessays. 2011, 33: 115-123. 10.1002/bies.201000044.Yook S-H, Oltvai ZN, Barabási A-L: Functional and topological characterization of protein interaction networks. Proteomics. 2004, 4: 928-942. 10.1002/pmic.200300636.Pržulj N, Wigle DA, Jurisica I: Functional topology in a network of protein interactions. Bioinformatics. 2004, 20: 340-348. 10.1093/bioinformatics/btg415.Elena SF, Carrera J, Rodrigo G: A systems biology approach to the evolution of plant-virus interactions. Curr Opin Plant Biol. 2011, 14: 372-377. 10.1016/j.pbi.2011.03.013.Zilian E, Maiss E: Detection of plum pox potyviral protein-protein interactions in planta using an optimized mRFP-based bimolecular fluorescence complementation system. J Gen Virol. 2011, 92: 2711-2723. 10.1099/vir.0.033811-0.Lin L, Shi Y, Luo Z, Lu Y, Zheng H, Yan F, Chen J, Chen J, Adams MJ, Wu Y: Protein-protein interactions in two potyviruses using the yeast two-hybrid system. Virus Res. 2009, 142: 36-40. 10.1016/j.virusres.2009.01.006.Guo D, Rajamäki M-L, Saarma M, Valkonen JPT: Towards a protein interaction map of potyviruses: protein interaction matrixes of two potyviruses based on the yeast two-hybrid system. J Gen Virol. 2001, 82: 935-939.Shen WT, Wang MQ, Yan P, Gao L, Zhou P: Protein interaction matrix of papaya ringspot virus type P based on a yeast two-hybrid system. Acta Virol. 2010, 54: 49-54. 10.4149/av_2010_01_49.Kang S, Ws L, Kh K: A protein interaction map of soybean mosaic virus strain G7H based on the yeast two-hybrid system. Mol Cells. 2004, 18: 122-126.Yambao MLM, Masuta C, Nakahara K, Uyeda I: The central and C-terminal domains of VPg of Clover yellow vein virus are important for VPg-HCPro and VPg-VPg interactions. J Gen Virol. 2003, 84: 2861-2869. 10.1099/vir.0.19312-0.Evidence for network evolution in an Arabidopsis interactome map. Science. 2011, 333: 601-607. 10.1126/science.1203877.Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13: 2498-2504. 10.1101/gr.1239303.Fouss F, Francoisse K, Yen L, Pirotte A, Saerens M: An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification. Neural Netw Off J Int Neural Netw Soc. 2012, 31: 53-72. 10.1016/j.neunet.2012.03.001.Bass JIF, Diallo A, Nelson J, Soto JM, Myers CL, Walhout AJM: Using networks to measure similarity between genes: association index selection. Nat Methods. 2013, 10: 1169-1176. 10.1038/nmeth.2728.Newman MEJ: Assortative mixing in networks. Phys Rev Lett. 2002, 89: 208701-10.1103/PhysRevLett.89.208701

    Kernels on Graphs as Proximity Measures

    Get PDF
    International audienceKernels and, broadly speaking, similarity measures on graphs are extensively used in graph-based unsupervised and semi-supervised learning algorithms as well as in the link prediction problem. We analytically study proximity and distance properties of various kernels and similarity measures on graphs. This can potentially be useful for recommending the adoption of one or another similarity measure in a machine learning method. Also, we numerically compare various similarity measures in the context of spectral clustering and observe that normalized heat-type similarity measures with log modification generally perform the best
    corecore