8 research outputs found

    Link Mining for Kernel-based Compound-Protein Interaction Predictions Using a Chemogenomics Approach

    Full text link
    Virtual screening (VS) is widely used during computational drug discovery to reduce costs. Chemogenomics-based virtual screening (CGBVS) can be used to predict new compound-protein interactions (CPIs) from known CPI network data using several methods, including machine learning and data mining. Although CGBVS facilitates highly efficient and accurate CPI prediction, it has poor performance for prediction of new compounds for which CPIs are unknown. The pairwise kernel method (PKM) is a state-of-the-art CGBVS method and shows high accuracy for prediction of new compounds. In this study, on the basis of link mining, we improved the PKM by combining link indicator kernel (LIK) and chemical similarity and evaluated the accuracy of these methods. The proposed method obtained an average area under the precision-recall curve (AUPR) value of 0.562, which was higher than that achieved by the conventional Gaussian interaction profile (GIP) method (0.425), and the calculation time was only increased by a few percent

    A nonuniform popularity-similarity optimization (nPSO) model to efficiently generate realistic complex networks with communities

    Get PDF
    The hidden metric space behind complex network topologies is a fervid topic in current network science and the hyperbolic space is one of the most studied, because it seems associated to the structural organization of many real complex systems. The Popularity-Similarity-Optimization (PSO) model simulates how random geometric graphs grow in the hyperbolic space, reproducing strong clustering and scale-free degree distribution, however it misses to reproduce an important feature of real complex networks, which is the community organization. The Geometrical-Preferential-Attachment (GPA) model was recently developed to confer to the PSO also a community structure, which is obtained by forcing different angular regions of the hyperbolic disk to have variable level of attractiveness. However, the number and size of the communities cannot be explicitly controlled in the GPA, which is a clear limitation for real applications. Here, we introduce the nonuniform PSO (nPSO) model that, differently from GPA, forces heterogeneous angular node attractiveness by sampling the angular coordinates from a tailored nonuniform probability distribution, for instance a mixture of Gaussians. The nPSO differs from GPA in other three aspects: it allows to explicitly fix the number and size of communities; it allows to tune their mixing property through the network temperature; it is efficient to generate networks with high clustering. After several tests we propose the nPSO as a valid and efficient model to generate networks with communities in the hyperbolic space, which can be adopted as a realistic benchmark for different tasks such as community detection and link prediction

    Hyperbolic matrix factorization improves prediction of drug-target associations

    Get PDF
    Past research in computational systems biology has focused more on the development and applications of advanced statistical and numerical optimization techniques and much less on understanding the geometry of the biological space. By representing biological entities as points in a low dimensional Euclidean space, state-of-the-art methods for drug-target interaction (DTI) prediction implicitly assume the flat geometry of the biological space. In contrast, recent theoretical studies suggest that biological systems exhibit tree-like topology with a high degree of clustering. As a consequence, embedding a biological system in a flat space leads to distortion of distances between biological objects. Here, we present a novel matrix factorization methodology for drug-target interaction prediction that uses hyperbolic space as the latent biological space. When benchmarked against classical, Euclidean methods, hyperbolic matrix factorization exhibits superior accuracy while lowering embedding dimension by an order of magnitude. We see this as additional evidence that the hyperbolic geometry underpins large biological networks

    Network-Based Methods for Prediction of Drug-Target Interactions

    Get PDF
    Drug-target interaction (DTI) is the basis of drug discovery. However, it is time-consuming and costly to determine DTIs experimentally. Over the past decade, various computational methods were proposed to predict potential DTIs with high efficiency and low costs. These methods can be roughly divided into several categories, such as molecular docking-based, pharmacophore-based, similarity-based, machine learning-based, and network-based methods. Among them, network-based methods, which do not rely on three-dimensional structures of targets and negative samples, have shown great advantages over the others. In this article, we focused on network-based methods for DTI prediction, in particular our network-based inference (NBI) methods that were derived from recommendation algorithms. We first introduced the methodologies and evaluation of network-based methods, and then the emphasis was put on their applications in a wide range of fields, including target prediction and elucidation of molecular mechanisms of therapeutic effects or safety problems. Finally, limitations and perspectives of network-based methods were discussed. In a word, network-based methods provide alternative tools for studies in drug repurposing, new drug discovery, systems pharmacology and systems toxicology

    Sparse Similarity and Network Navigability for Markov Clustering Enhancement

    Get PDF
    Markov clustering (MCL) is an effective unsupervised pattern recognition algorithm for data clustering in high-dimensional feature space that simulates stochastic flows on a network of sample similarities to detect the structural organization of clusters in the data. However, it presents two main drawbacks: (1) its community detection performance in complex networks has been demonstrating results far from the state-of-the-art methods such as Infomap and Louvain, and (2) it has never been generalized to deal with data nonlinearity. In this work both aspects, although closely related, are taken as separated issues and addressed as such. Regarding the community detection, field under the network science ceiling, the crucial issue is to convert the unweighted network topology into a ‘smart enough’ pre-weighted connectivity that adequately steers the stochastic flow procedure behind Markov clustering. Here a conceptual innovation is introduced and discussed focusing on how to leverage network latent geometry notions in order to design similarity measures for pre-weighting the adjacency matrix used in Markov clustering community detection. The results demonstrate that the proposed strategy improves Markov clustering significantly, to the extent that it is often close to the performance of current state-of-the-art methods for community detection. These findings emerge considering both synthetic ‘realistic’ networks (with known ground-truth communities) and real networks (with community metadata), even when the real network connectivity is corrupted by noise artificially induced by missing or spurious links. Regarding the nonlinearity aspect, the development of algorithms for unsupervised pattern recognition by nonlinear clustering is a notable problem in data science. Minimum Curvilinearity (MC) is a principle that approximates nonlinear sample distances in the high-dimensional feature space by curvilinear distances, which are computed as transversal paths over their minimum spanning tree, and then stored in a kernel. Here, a nonlinear MCL algorithm termed MC-MCL is proposed, which is the first nonlinear kernel extension of MCL and exploits Minimum Curvilinearity to enhance the performance of MCL in real and synthetic high-dimensional data with underlying nonlinear patterns. Furthermore, improvements in the design of the so-called MC-kernel by applying base modifications to better approximate the data hidden geometry have been evaluated with positive outcomes. Thus, different nonlinear MCL versions are compared with baseline and state-of-art clustering methods, including DBSCAN, K-means, affinity propagation, density peaks, and deep-clustering. As result, the design of a suitable nonlinear kernel provides a valuable framework to estimate nonlinear distances when its kernel is applied in combination with MCL. Indeed, nonlinear-MCL variants overcome classical MCL and even state-of-art clustering algorithms in different nonlinear datasets. This dissertation discusses the enhancements and the generalized understanding of how network geometry plays a fundamental role in designing algorithms based on network navigability