2,327 research outputs found

    A fast ranking algorithm for predicting gene functions in biomolecular networks

    Get PDF
    Ranking genes in functional networks according to a specific biological function is a challenging task raising relevant performance and computational complexity problems. To cope with both these problems we developed a transductive gene ranking method based on kernelized score functions able to fully exploit the topology and the graph structure of biomolecular networks and to capture significant functional relationships between genes. We run the method on a network constructed by integrating multiple biomolecular data sources in the yeast model organism, achieving significantly better results than the compared state-of-the-art network-based algorithms for gene function prediction, and with relevant savings in computational time. The proposed approach is general and fast enough to be in perspective applied to other relevant node ranking problems in large and complex biological networks

    Automated gene function prediction through gene multifunctionality in biological networks

    Get PDF
    As the number of sequenced genomes rapidly grows, Automated Prediction of gene Function (AFP) is now a challenging problem. Despite significant progresses in the last several years, the accuracy of gene function prediction still needs to be improved in order to be used effectively in practice. Two of the main issues of AFP problem are the imbalance of gene functional annotations and the 'multifunctional properties' of genes. While the former is a well studied problem in machine learning, the latter has recently emerged in bioinformatics and few studies have been carried out about it. Here we propose a method for AFP which appropriately handles the label imbalance characterizing biological taxonomies, and embeds in the model the property of some genes of being 'multifunctional'. We tested the method in predicting the functions of the Gene Ontology functional hierarchy for genes of yeast and fly model organisms, in a genome-wide approach. The achieved results show that cost-sensitive strategies and 'gene multifunctionality' can be combined to achieve significantly better results than the compared state-of-the-art algorithms for AFP

    Network modeling of patients' biomolecular profiles for clinical phenotype/outcome prediction

    Get PDF
    Methods for phenotype and outcome prediction are largely based on inductive supervised models that use selected biomarkers to make predictions, without explicitly considering the functional relationships between individuals. We introduce a novel network-based approach named Patient-Net (P-Net) in which biomolecular profiles of patients are modeled in a graph-structured space that represents gene expression relationships between patients. Then a kernel-based semi-supervised transductive algorithm is applied to the graph to explore the overall topology of the graph and to predict the phenotype/clinical outcome of patients. Experimental tests involving several publicly available datasets of patients afflicted with pancreatic, breast, colon and colorectal cancer show that our proposed method is competitive with state-of-the-art supervised and semi-supervised predictive systems. Importantly, P-Net also provides interpretable models that can be easily visualized to gain clues about the relationships between patients, and to formulate hypotheses about their stratification

    TopologyNet: Topology based deep convolutional neural networks for biomolecular property predictions

    Full text link
    Although deep learning approaches have had tremendous success in image, video and audio processing, computer vision, and speech recognition, their applications to three-dimensional (3D) biomolecular structural data sets have been hindered by the entangled geometric complexity and biological complexity. We introduce topology, i.e., element specific persistent homology (ESPH), to untangle geometric complexity and biological complexity. ESPH represents 3D complex geometry by one-dimensional (1D) topological invariants and retains crucial biological information via a multichannel image representation. It is able to reveal hidden structure-function relationships in biomolecules. We further integrate ESPH and convolutional neural networks to construct a multichannel topological neural network (TopologyNet) for the predictions of protein-ligand binding affinities and protein stability changes upon mutation. To overcome the limitations to deep learning arising from small and noisy training sets, we present a multitask topological convolutional neural network (MT-TCNN). We demonstrate that the present TopologyNet architectures outperform other state-of-the-art methods in the predictions of protein-ligand binding affinities, globular protein mutation impacts, and membrane protein mutation impacts.Comment: 20 pages, 8 figures, 5 table

    Learning node labels with multi-category Hopfield networks

    Get PDF
    In several real-world node label prediction problems on graphs, in fields ranging from computational biology to World Wide Web analysis, nodes can be partitioned into categories different from the classes to be predicted, on the basis of their characteristics or their common properties. Such partitions may provide further information about node classification that classical machine learning algorithms do not take into account. We introduce a novel family of parametric Hopfield networks (m-category Hopfield networks) and a novel algorithm (Hopfield multi-category \u2014 HoMCat ), designed to appropriately exploit the presence of property-based partitions of nodes into multiple categories. Moreover, the proposed model adopts a cost-sensitive learning strategy to prevent the remarkable decay in performance usually observed when instance labels are unbalanced, that is, when one class of labels is highly underrepresented than the other one. We validate the proposed model on both synthetic and real-world data, in the context of multi-species function prediction, where the classes to be predicted are the Gene Ontology terms and the categories the different species in the multi-species protein network. We carried out an intensive experimental validation, which on the one hand compares HoMCat with several state-of-the-art graph-based algorithms, and on the other hand reveals that exploiting meaningful prior partitions of input data can substantially improve classification performances

    edge2vec: Representation learning using edge semantics for biomedical knowledge discovery

    Full text link
    Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems. In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks: biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, \textbf{edge2vec}\ significantly outperforms state-of-the-art models on all three tasks. We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.Comment: 10 page

    Analysis of bio-molecular networks through semi-supervised graph-based learning methods

    Get PDF
    Relevant problems in the context of molecular biology and medicine can be modeled through graphs where nodes represent bio-molecular or chemical entities (e.g. genes or drugs) and edges some notion of similarity between them. In this context, semi-supervised learning methods able to exploit both the local (e.g. the neighborhood of a node) and the global characteristics of the network (e.g. its overall topology) have been applied to extract meaningful biological and medical knowledge from a biological system. In this work we summarize the main characteristics of RANKS (RAnking Nodes through Kernelized Score functions), a recently proposed semi-supervised algorithmic scheme based on local score functions embedding well-designed graph kernels, able to deal with both the local and the global features of the analyzed network. We show some successful applications of RANKS in the context of protein function prediction, gene disease association and drug repositioning problems. Moreover we present a novel secondary memory-based and "vertex-centric" version of the algorithm able to nicely scale on graphs with hundreds of thousands of nodes and tens of millions of edges, using off-the-shelf desktop computers, and we show an application to a complex multi-species protein function prediction problem

    How to understand the cell by breaking it: network analysis of gene perturbation screens

    Get PDF
    Modern high-throughput gene perturbation screens are key technologies at the forefront of genetic research. Combined with rich phenotypic descriptors they enable researchers to observe detailed cellular reactions to experimental perturbations on a genome-wide scale. This review surveys the current state-of-the-art in analyzing perturbation screens from a network point of view. We describe approaches to make the step from the parts list to the wiring diagram by using phenotypes for network inference and integrating them with complementary data sources. The first part of the review describes methods to analyze one- or low-dimensional phenotypes like viability or reporter activity; the second part concentrates on high-dimensional phenotypes showing global changes in cell morphology, transcriptome or proteome.Comment: Review based on ISMB 2009 tutorial; after two rounds of revisio

    Benchmarking network propagation methods for disease gene identification

    Get PDF
    In-silico identification of potential target genes for disease is an essential aspect of drug target discovery. Recent studies suggest that successful targets can be found through by leveraging genetic, genomic and protein interaction information. Here, we systematically tested the ability of 12 varied algorithms, based on network propagation, to identify genes that have been targeted by any drug, on gene-disease data from 22 common non-cancerous diseases in OpenTargets. We considered two biological networks, six performance metrics and compared two types of input gene-disease association scores. The impact of the design factors in performance was quantified through additive explanatory models. Standard cross-validation led to over-optimistic performance estimates due to the presence of protein complexes. In order to obtain realistic estimates, we introduced two novel protein complex-aware cross-validation schemes. When seeding biological networks with known drug targets, machine learning and diffusion-based methods found around 2-4 true targets within the top 20 suggestions. Seeding the networks with genes associated to disease by genetics decreased performance below 1 true hit on average. The use of a larger network, although noisier, improved overall performance. We conclude that diffusion-based prioritisers and machine learning applied to diffusion-based features are suited for drug discovery in practice and improve over simpler neighbour-voting methods. We also demonstrate the large impact of choosing an adequate validation strategy and the definition of seed disease genesPeer ReviewedPostprint (published version
    • …
    corecore