86 research outputs found

    Global Geometric Affinity for Revealing High Fidelity Protein Interaction Network

    Get PDF
    Protein-protein interaction (PPI) network analysis presents an essential role in understanding the functional relationship among proteins in a living biological system. Despite the success of current approaches for understanding the PPI network, the large fraction of missing and spurious PPIs and a low coverage of complete PPI network are the sources of major concern. In this paper, based on the diffusion process, we propose a new concept of global geometric affinity and an accompanying computational scheme to filter the uncertain PPIs, namely, reduce the spurious PPIs and recover the missing PPIs in the network. The main concept defines a diffusion process in which all proteins simultaneously participate to define a similarity metric (global geometric affinity (GGA)) to robustly reflect the internal connectivity among proteins. The robustness of the GGA is attributed to propagating the local connectivity to a global representation of similarity among proteins in a diffusion process. The propagation process is extremely fast as only simple matrix products are required in this computation process and thus our method is geared toward applications in high-throughput PPI networks. Furthermore, we proposed two new approaches that determine the optimal geometric scale of the PPI network and the optimal threshold for assigning the PPI from the GGA matrix. Our approach is tested with three protein-protein interaction networks and performs well with significant random noises of deletions and insertions in true PPIs. Our approach has the potential to benefit biological experiments, to better characterize network data sets, and to drive new discoveries

    To Embed or Not: Network Embedding as a Paradigm in Computational Biology

    Get PDF
    Current technology is producing high throughput biomedical data at an ever-growing rate. A common approach to interpreting such data is through network-based analyses. Since biological networks are notoriously complex and hard to decipher, a growing body of work applies graph embedding techniques to simplify, visualize, and facilitate the analysis of the resulting networks. In this review, we survey traditional and new approaches for graph embedding and compare their application to fundamental problems in network biology with using the networks directly. We consider a broad variety of applications including protein network alignment, community detection, and protein function prediction. We find that in all of these domains both types of approaches are of value and their performance depends on the evaluation measures being used and the goal of the project. In particular, network embedding methods outshine direct methods according to some of those measures and are, thus, an essential tool in bioinformatics research

    Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

    Full text link
    Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

    Graph embedding and geometric deep learning relevance to network biology and structural chemistry

    Get PDF
    Graphs are used as a model of complex relationships among data in biological science since the advent of systems biology in the early 2000. In particular, graph data analysis and graph data mining play an important role in biology interaction networks, where recent techniques of artificial intelligence, usually employed in other type of networks (e.g., social, citations, and trademark networks) aim to implement various data mining tasks including classification, clustering, recommendation, anomaly detection, and link prediction. The commitment and efforts of artificial intelligence research in network biology are motivated by the fact that machine learning techniques are often prohibitively computational demanding, low parallelizable, and ultimately inapplicable, since biological network of realistic size is a large system, which is characterised by a high density of interactions and often with a non-linear dynamics and a non-Euclidean latent geometry. Currently, graph embedding emerges as the new learning paradigm that shifts the tasks of building complex models for classification, clustering, and link prediction to learning an informative representation of the graph data in a vector space so that many graph mining and learning tasks can be more easily performed by employing efficient non-iterative traditional models (e.g., a linear support vector machine for the classification task). The great potential of graph embedding is the main reason of the flourishing of studies in this area and, in particular, the artificial intelligence learning techniques. In this mini review, we give a comprehensive summary of the main graph embedding algorithms in light of the recent burgeoning interest in geometric deep learning
    • …
    corecore